+ All Categories
Home > Documents > An Effective Statistical Approach to Blog Post Opinion Retrieval

An Effective Statistical Approach to Blog Post Opinion Retrieval

Date post: 02-Feb-2016
Category:
Upload: eryk
View: 24 times
Download: 0 times
Share this document with a friend
Description:
An Effective Statistical Approach to Blog Post Opinion Retrieval. Ben He, Craig Macdonald, Jiyin He, Iadh Ounis (CIKM 2008). Introduction. Blogs have recently emerged as a new grassroots publishing medium. - PowerPoint PPT Presentation
Popular Tags:
22
An Effective Sta tistical Approac h to Blog Post Opinion Retrieva l Ben He, Craig Macdonald, Jiy in He, Iadh Ounis (CIKM 2008)
Transcript
Page 1: An Effective Statistical Approach to Blog Post  Opinion Retrieval

An Effective Statistical Approach to Blog Post

Opinion Retrieval

Ben He, Craig Macdonald, Jiyin He, Iadh Ounis

(CIKM 2008)

Page 2: An Effective Statistical Approach to Blog Post  Opinion Retrieval

2

Introduction

Blogs have recently emerged as a new grassroots publishing medium.

A key feature that distinguishes blog content from other Web content is their subjective nature.

Bloggers tend to express opinions and comments towards some given targets, such as persons, organizations or products.

Page 3: An Effective Statistical Approach to Blog Post  Opinion Retrieval

3

Under the TREC opinion finding task, only a handful of groups achieved an improvement over their baseline, using techniques such as NLP or SVM classifiers.

These proposed approaches either involve considerable manual efforts in collecting evidence for opinions, or lead to little improvement over a baseline that does not include any opinion finding feature.

Introduction

Page 4: An Effective Statistical Approach to Blog Post  Opinion Retrieval

4

This paper proposes a statistical and light-weight automatic dictionary-based approach.

Also shows that despite its apparent simplicity, it provides statistically significant improvements over robust baselines, including the best TREC baseline run, without any manual effort.

Introduction

Page 5: An Effective Statistical Approach to Blog Post  Opinion Retrieval

5

The Statistical Dictionary-basedApproach to Opinion Retrieval

1. Automatically generates a dictionary from the collection without requiring manual effort.

2. Assigns a weight to each term in the dictionary, which represents how opinionated the term is.

3. Assigns an opinion score to each document in the collection using the top weighted terms from the dictionary as a query.

4. Appropriately combines the opinion score with the initial relevance score produced by the retrieval baseline.

Page 6: An Effective Statistical Approach to Blog Post  Opinion Retrieval

6

Dictionary Generation

To derive the dictionary, we filter out too frequent or too rare terms in the collection.

We remove those terms because if a term appears too many or too few times in the collection, then it probably contains too little or too specific information so that it can not be generalized to different queries in indicating opinion.

Page 7: An Effective Statistical Approach to Blog Post  Opinion Retrieval

7

We firstly rank all terms in the collection by their within-collection frequencies in descending order.

The terms, whose rankings are in the range (s·#terms, u·#terms), are selected in the dictionary.

We apply s = 0.00007 and u = 0.001.

Dictionary Generation

Page 8: An Effective Statistical Approach to Blog Post  Opinion Retrieval

8

Dictionary Generation

Page 9: An Effective Statistical Approach to Blog Post  Opinion Retrieval

9

Term Weighting

D(Rel): relevant document set. D(opRel): opinionated relevant document set. For each term t in the opinion term dictionary, w

e measure wopn(t), the divergence of the term’s distribution in D(opRel) from that in D(Rel).

This divergence value measures how a term stands out from the opinionated documents, compared with all relevant documents.

The higher the divergence is, the more opinionated the term is.

Page 10: An Effective Statistical Approach to Blog Post  Opinion Retrieval

10

Term Weighting

A commonly used measure for term weighting is the KL divergence from a term’s distribution in a document set to its distribution in the whole collection.

Page 11: An Effective Statistical Approach to Blog Post  Opinion Retrieval

11

KL divergence measure considers only the divergence from one distribution to the other, while ignoring how frequent a term occurs in the opinionated documents.

The weights of the terms in the opinion dictionary might be biased towards the terms with high KL divergence values, but containing low information in the opinionated document set D(opRel).

Term Weighting

Page 12: An Effective Statistical Approach to Blog Post  Opinion Retrieval

12

Term Weighting

Another method: Bo1 term weighting model, which measures how informative a term is in the set D(opRel) against D(Rel).

λ= tfrel/Nrel

Page 13: An Effective Statistical Approach to Blog Post  Opinion Retrieval

13

Generating the Opinion Score

We take the X (in the experiment, set X=100) top weighted terms from the opinion dictionary, and submit them to the retrieval system as a query Qopn.

The retrieval system assigns a relevance score to each document in the collection.

Such a relevance score reflects the extent to which the top weighted opinionated terms are informative in the document, capturing the overall opinionated nature of the document.

This is called the opinion score: Score(d, Qopn).

Page 14: An Effective Statistical Approach to Blog Post  Opinion Retrieval

14

Score Combination

1. Linear combination:

2. Log. combination:

Page 15: An Effective Statistical Approach to Blog Post  Opinion Retrieval

15

Experiment: Data

Dataset: Blog06 collection. Use permalinks, which are the blog posts and t

heir associated comments. Each term is stemmed using Porter’s English st

emmer, and standard English stopwords are removed.

Page 16: An Effective Statistical Approach to Blog Post  Opinion Retrieval

16

Experiment: Baseline

InLB document weighting model:

b=0.2337

Page 17: An Effective Statistical Approach to Blog Post  Opinion Retrieval

17

Experiment: External Opinion Dictionary

We also manually generate a dictionary compiled from various external linguistic resources.

The dictionary contains approximately 12,000 English words, mostly adjectives, adverbs and nouns, which are supposed to be subjective.

In this paper, we denote the manually edited dictionary by the external dictionary, and we denote the automatically derived one by the internal dictionary.

Page 18: An Effective Statistical Approach to Blog Post  Opinion Retrieval

18

Experiment: External Opinion Dictionary

Page 19: An Effective Statistical Approach to Blog Post  Opinion Retrieval

19

Experiment: Evaluation

Page 20: An Effective Statistical Approach to Blog Post  Opinion Retrieval

20

Experiment: Evaluation

Use Bo1 term weighting method. Set a=0.25, k=250.

Page 21: An Effective Statistical Approach to Blog Post  Opinion Retrieval

21

This paper has proposed an effective and practical approach to retrieving opinionated blog posts without the need for manual effort.

The use of the automatically generated internal dictionary provides a retrieval performance that is as good as the use of an external dictionary manually compiled from various linguistic resources.

Conclusions and Future Work

Page 22: An Effective Statistical Approach to Blog Post  Opinion Retrieval

22

In the future:1. Extend the work to detecting the polarity or the

orientation of the retrieved opinionated documents.

2. Study the connection of the opinion finding task to question answering.

Ex. Extracting the opinionated sentences within a blog post about a given target.

Conclusions and Future Work


Recommended