Review Topic Discovery with Phrases using the Pólya Urn...

Review Topic Discovery with Phrases

using the Polya Urn Model

Geli Fei, Zhiyuan Chen, Bing Liu University of Illinois at Chicago

Presenter: Alan Akbik

IBM Research Almaden / Berlin Institute of Technology

Product Aspects

Large collection of product reviews

◦ Example domain: Smartphones

Task: Discover aspects that are being discussed in

the reviews

◦ Battery - Battery life, AAA batteries „The battery life of this smartphone is great.“

„It uses AAA batteries.“

◦ Screen - Screen size, touch screen

◦ Camera - Resolution, image quality

Topic Models

Widely used in review topic / aspect discovery

Most models regard each topic as a distribution over individual terms (unigrams)

Terms in each document are assigned to topics

◦ Documents assigned to topics via terms

The generation of topics is mostly governed by “higher order co-occurrence” (Heinrich 2009)

◦ i.e., how often words co-occur in different contexts

Topic Models

Major issue: individual words may not convey the

same information as natural phrases

◦ e.g. “battery life” vs. “life”

Leading to three problems:

◦ Interpretability - Topics are hard for users to interpret

unless they are domain experts

◦ Ambiguity - Hard to directly make use of the topical

words

◦ False evidence - Causes extra or wrong co-occurrences

in topic generation, leading to poorer topics

Possible Solutions (1)

Treat each whole phrase as one term

“The battery life of this smartphone is great”

<the> <battery_life> <of> <this> <smartphone> <is> <great>

Problems: ◦ Many phrases very rare

◦ Remove important words “battery life” may not be in the same topic as “battery”, because we

don’t observe co-occurence

Possible Solutions (2)

Keep individual words, add extra terms for phrases

“The battery life of this tablet is great”

<the> <battery> <life> <battery_life> <of> <this> <smartphone>

<is> <great>

Problems: ◦ False evidence still exists

◦ Many phrases rare “battery life” is much less frequent than “life” to be ranked

on the top in a topic

Challenge

How to retain connections between phrases

and words while removing wrong co-

occurrences?

Related Work

Using n-grams in topic modeling (Mukherjee and Liu 2013; Mukherjee et al. 2013).

Identifying key phrases in the post-processing step based on the discovered topical unigrams (Blei and Lafferty 2009; Liu et al. 2010; Zhao et al. 2011).

Directly modeling word order in topic model (Wallach 2006; Wang et al. 2007).

◦ Breaking the “bag-of-word” assumption

◦ Although ”bag-of-word” assumption does not always hold, it offers a great computational advantage

◦ Our method still follows the ”bag-of-word” assumption

Gibbs Sampling for LDA

One of the most commonly used inference

techniques for topic models.

Considers each term in the documents in turn

Samples a topic to the current term,

conditioned on the topic assignments of other

terms.

Simple Po lya Urn Model (SPU)

Designed in the context of colored balls and urns

In the context of topic models:

◦ A ball with a certain color: a term

◦ The urn: contains a mixture of balls with various colors (terms)

Topic-word (topic-term) distribution is reflected by the proportion of balls with a certain color in the urn

Simple Po lya Urn Model (SPU)

Left: initial state

Middle: draw a ball of a certain color

Right: put two balls of the same color back

Self-reinforcing property known as “the rich get richer”

Generalized Po lya Urn Model (GPU)

GPU vs. SPU: apart from two balls with the same color being put back, a certain number of balls with some other colors are also put in the urn.

We call this the promotion of these colored balls

Using the idea in the sampling process:

◦ SPU: seeing “staff” under a topic only increases the chance of seeing it again under the same topic

◦ GPU: also increases the chance of seeing “hotel staff” under the topic

Generalized Po lya Urn Model (GPU)

In our application: ◦ We use each whole phrase as a term to remove

wrong co-occurrences

◦ And use GPU to regain the connection between phrases and words

Two directions of promotion: ◦ Word to phrase: when a topic is assigned to an

individual word, phrases containing the word are promoted

◦ Phrase to word: when a topic is assigned to a phrase, each component word is promoted

Datasets and Preprocessing

Data sets:

◦ 30 categories of electronics reviews from Amazon (1,000

reviews in each category)

◦ Hotel reviews from TripAdvisor (101,234 reviews)

◦ Restaurant reviews from Yelp (25,459 reviews)

Preprocessing:

◦ Review sentences as documents

Standard topic models cannot discover product aspects well

when directly applied to reviews (Titov and McDonald, 2008)

◦ Rule-based method for noun phrase detection

Use rule-based method for efficiency

Experiments

Four sets of experiments on 32 domains

◦ Baseline #1, LDA(w): without considering phrases

◦ Baseline #2, LDA(p): considers phrases, uses each whole phrase as a term

◦ Baseline #3, LDA(w_p): considers phrases, keeps individual component words, and adds phrases as extra terms

◦ LDA(p_GPU): Our proposed method

Parameter Setting

Use the same set of parameters for all experiments

◦ Set Dirichlet priors as in (Griffiths and Steyvers, 2004)

Set document-topic prior 𝛼=50/𝐾, where 𝐾 is the number of

topics.

Set topic-term prior 𝛽=0.1

◦ Set number of topics 𝐾=15

◦ posterior inference was drawn after 2000 Gibbs sampling

iterations with 400 iterations of burn-in

Parameters for GPU Model

Not all words in a phrase are equally important ◦ e.g. “staff” is more important than “hotel” in “hotel staff”

Determine head nouns

◦ Following (Wang et al., 2007), we assume the last word in a noun phrase as the head noun

GPU promotion ◦ Word to phrase: promote a phrase by virtualcount when a topic is

assigned to its head noun

◦ Phrase to word: promote 0.5 * virtualcount to the head noun and 0.25 * virtualcount to all other words when a topic is assigned to a phrase

◦ Set virtualcount=0.1 empirically, based on how much to promote phrases

Statistical Evaluation

Two commonly used evaluation statistics:

◦ Perplexity: measures the likelihood of unseen documents

◦ KL-divergence: measure the distinctiveness of topics

◦ Neither of them correlates well with human judgments

We use topic coherence (Mimno et al. 2011)

◦ It measures the degree of co-occurrence of topical words

under a topic

◦ Has been shown to correlate with human judgment quite well

◦ Generates a negative value, the higher the better


Topic Coherence using top 15 topical terms


Topic Coherence using top 30 topical terms

Human Evaluation

Done by two annotators in two stages sequentially

◦ Topic labeling (Kappa score: 0.838)

◦ Topical terms labeling by computing precision@n

(Kappa score: 0.846)

◦ We compute average p@15 and p@30 for each model

on each domain

Human Evaluation

Human evaluation on five domains

◦ Hotel, Restaurant, Watch, Tablet, MP3Player

Example Topics

Example topics by LDA(w) and LDA(p_GPU)

Future Work

Design a topic quality metrics for topics with

phrases

Systematically set the amount of promotion

based on the designed metrics

Thank You!

Date post:	26-Sep-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times