+ All Categories
Transcript
Page 1: A Corpus for Entity Profiling in Microblog Posts

A Corpus for Entity Profiling in Microblog Posts

UNED NLP & IR Group

Madrid, Spain

ISLA, University of Amsterdam

Amsterdam, The Netherlands

LREC Workshop on Language Engineering for Online Reputation Management

May 26th, 2012 - Istambul, Turkey

Edgar Meij, Andrei Oghina, Minh T. Bui, Mathias Breuss,

Maarten de Rijke Damiano Spina

Page 2: A Corpus for Entity Profiling in Microblog Posts

Introduction

• Online Reputation Management

– Public image of an entity in Online Media

– Entity = { brand, organization, company, person, product }

• Microblogging services (e.g. Twitter)

– People sharing thoughts about an entity

– Dynamic, Real-Time

• Human Language Technologies

– Aid to reputation managers

– Retrieval and Analysis of entity mentions

Page 3: A Corpus for Entity Profiling in Microblog Posts

Sentiment vs. Profiling

• Sentiment analysis

• Entity Profiling – “hot” topics that people talk about in the context of an entity

Page 4: A Corpus for Entity Profiling in Microblog Posts

Our task: Aspect identification

• @xbox_news here we go again,

microsoft being jealous of sony again.

• I lov big Sony headphones .. I lov my #music 2 b

more beautiful

• not surprising that @graypowell was out and about - he used to be a ’Field Verification & Operator Acceptance Engineer’ at Sony

Page 5: A Corpus for Entity Profiling in Microblog Posts

Our task: Aspect identification

• @xbox_news here we go again,

microsoft being jealous of sony again.

• I lov big Sony headphones .. I lov my #music 2 b more beautiful

• not surprising that @graypowell was out and about - he used to be a ’Field Verification & Operator Acceptance Engineer’ at Sony

Page 6: A Corpus for Entity Profiling in Microblog Posts

Goal

• Build manually annotated corpora

– Evaluate the task of entity profiling in microblog streams

Page 7: A Corpus for Entity Profiling in Microblog Posts

A Corpus for Entity Profiling in Microblog Posts

WePS-3 ORM Corpus

Collection of tweets Disambiguated company names (e.g. apple fruit vs. Apple Inc.)

Page 8: A Corpus for Entity Profiling in Microblog Posts

A Corpus for Entity Profiling in Microblog Posts

WePS-3 ORM Corpus

Pooling Aspects

Tweet annotation

Opinion targets

Page 9: A Corpus for Entity Profiling in Microblog Posts

A Corpus for Entity Profiling in Microblog Posts

WePS-3 ORM Corpus

Pooling Aspects

Tweet annotation

Opinion targets

Page 10: A Corpus for Entity Profiling in Microblog Posts

Approach I: Pooling aspects

• Pooling methodology

– 4 Ranking Methods:

• TF.IDF [Salton and Buckley, 1988]

• Log-Likelihood Ratio [Dunning, 1993]

• Parsimonious Language Model [Hiemstra et al. 2004]

• Opinion target extraction using topic-specific subjective lexicons [Jijkoun et al. 2010]

– Top 10 terms

• Manual annotation

Page 11: A Corpus for Entity Profiling in Microblog Posts

Aspects dataset: annotation example

Page 12: A Corpus for Entity Profiling in Microblog Posts

Aspects dataset: outcome

• Three annotators, substantial agreement

(> 0.6 Cohen/Fleiss’ kappa)

• 94 entities, 17775 tweets, ≈177 tweets/entity

• 2455 terms, 1304 aspects (54.11%)

Page 13: A Corpus for Entity Profiling in Microblog Posts

Approach II: Tweet annotation

• Opinion targets dataset

• Tweet-level annotation – Is the tweet subjective?

• Phrase-level annotation – Subjective phrase

– Opinion target phrase p: • p is an aspect of the entity

• p is included in a sentence that contains a direct subjective phrase

• p is the target of the expressed opinion

Page 14: A Corpus for Entity Profiling in Microblog Posts

Opinion Targets dataset: annotation example

Page 15: A Corpus for Entity Profiling in Microblog Posts

Opinion targets dataset: outcome

• 59 entities, 9396 tweets, ≈159 tweets/entity

• 15.16% of tweets with subjective phrases

• 13.82% of tweets with opinion targets

Page 16: A Corpus for Entity Profiling in Microblog Posts

Aspects vs. Opinion targets

1650 783 270

Aspects

Terms in Opinion Targets

Page 17: A Corpus for Entity Profiling in Microblog Posts

Aspects vs. Opinion targets

1650 783 270

Aspects

Terms in Opinion Targets

26.69%

12.67%

Page 18: A Corpus for Entity Profiling in Microblog Posts

A Corpus for Entity Profiling in Microblog Posts

• Available at

http://bitly.com/profilingTwitter

WePS-3 ORM Corpus

Pooling

Aspects dataset

Tweet annotation

Opinion targets dataset

• 94 entities, 17,775 tweets ≈177 tweets/entity • 2455 terms, 1304 aspects (54.11%)

• 59 entities, 9,396 tweets, ≈159 tweets/entity • 15.16% of tweets with subj. phrases • 13.82% of tweets with opinion targets


Top Related