Hierarchical aspect and sentiment model, Context-dependent conceptualisation

transcript

Hiearchical Aspect-Sentiment Model & Context-Dependent Conceptualization

Alice Oh alice.oh@kaist.edu http://uilab.kaist.ac.kr/ April 11, 2013

Overview

¤ Hierarchical Aspect-Sentiment Model (AAAI 2013) ¤  Suin Kim, et al.

¤ Collaboration with Microsoft Research Asia

¤ Context-Dependent Conceptualization (IJCAI 2013) ¤ Dongwoo Kim, Haixun Wang, Alice Oh

¤ Collaboration with Microsoft Research Asia

Users & Information Lab @ KAIST

Hiearchical Aspect-Sentiment Model (AAAI-13) Suin Kim, Jianwen Zhang, Zheng Chen, Alice Oh, and Shixia Liu

Hierarchical aspect-sentiment model ¤ Goal: To discover a hierarchy of aspects and associated

sentiments from a corpus of online reviews

¤ Assumptions ¤  Each sentence expresses a single aspect and a single sentiment ¤  An aspect (e.g., “battery life”) consists of neutral, positive, and

negative words

¤ Model: A hierarchical aspect-sentiment joint model using the recursive Chinese restaurant processes (rCRP)

¤ Results ¤  A reasonable hierarchy of aspects discovered without supervision ¤  Sentiment classification accuracy comparable other recent

sentiment-aspect joint models

Aspect-sentiment hierarchy

Goals •  To discover and organize the aspects and associated sentiments into a hierarchy •  To determine the aspect in each sentence •  To determine the sentiment of each sentence

Comparison to other models 7

General Specific

Positive

Neutral

Negative

ASUM & JST

Multigrain Topic Model

General

Specific

Positive Negative

Reverse JST

Hierarchical Aspect-Sentiment Model

General Specific

Positive

Neutral

Negative

ASUM & JST

General

Specific

Positive Negative

Reverse JST

General Specific

Positive

Neutral

Negative

ASUM & JST

General

Specific

Positive Negative

Reverse JST

General Specific

Positive

Neutral

Negative

ASUM & JST

General

Specific

Positive Negative

Reverse JST

General Specific

Positive

Neutral

Negative

ASUM & JST

General

Specific

Positive Negative

Reverse JST

Aspect-sentiment hierarchy

•  Aspects tend to be general near the root and specific toward the leaves •  Each aspect node consists of positive and negative polarity •  Each sentence in a review is generated from a single aspect and sentiment •  Each word in a sentence is either neutral or subjective

“The screen is clear and the picture quality is outstanding.”

the screen is and the picture

clear quality outstanding

“A short battery life undermines portability.”

A battery life portability

short undermines

HASM: Experiments & Results

¤ Data: Amazon reviews on laptops (10,014) and DSLRs (20,862)

¤ Aspect-sentiment hierarchies

¤ Quantitative evaluation ¤ Topic specialization

¤ Hierarchical affinity

¤ Aspect-sentiment consistency

¤ Fine-grained sentiment classification

¤ User scenario

Topic specialization

Evaluates the general-to-specific nature of the hierarchy by comparing the average distance of the aspect nodes from the root at each tree depth

Hierarchical affinity

Measures whether a parent-child pair shows smaller distance compared to a non-parent-child pair, one at level L and another at level L+1

Aspect-sentiment consistency

Measures how in-node topics are statistically coherent by comparing •  average intra-node topic distance •  average inter-node topic distance ttt

ttt ttt

Sentiment classification accuracy

•  Sentiment classification using short (<100 characters) reviews

•  Small set contains positive reviews of 5 stars, negative reviews of 1 star

•  Large set contains positive reviews of 4~5 stars, negative reviews of 1~2 stars

User scenario

Visualization of hierarchical aspect-sentiments for a user who is looking for a camera with good picture quality under low lights, a good LCD screen, and high-end lenses

Context-dependent Conceptualization (IJCAI 2013) Dongwoo Kim, Haixun Wang, Alice Oh

Semantic relatedness

Apple reveals new iPad

Microsoft introduces Surface

Surface vs iPad

Samsung’s new android tablets

iPhone 5, the best smart phone ever

By Topic Modeling

iPad Apple

Microsoft iPhone

Software Samsung

SmartPhone Android

Software Company iOS

Mobile Phones

Contextual relatedness

Fruit Company

Food Fresh fruit Fruit tree

Brand Crop

Flavor Item

Manufacturer

Device Platform Technology Mobile device Tablet Portable device Tablet computer Gadget Apple product Output device

Conceptualization given semantic context

Fruit Company

Brand Crop

Flavor Item

Manufacturer

s Semantic Context of Sentence

Concept of Apple Concept of iPad

Conceptualization given semantic context

Fruit Company

Brand Crop

Flavor Item

Manufacturer

s Semantic Context of Sentence

Concept of Apple Concept of iPad

Reinforcing concepts Based on context

Fruit Company

Brand Crop

Flavor Item

Manufacturer

Context-dependent conceptualization

company 0.104 client 0.078 tree 0.069

corporation 0.050 computer 0.047

software company 0.041 oems 0.025 laptop 0.020

personal computer 0.019 host 0.019

Concept of Apple

Apple and iPad

fruit 0.039 food 0.035

company 0.026 brand 0.024 flavor 0.021 crop 0.020 juice 0.018

fresh fruit 0.017 plant 0.017 snack 0.015

Apple and Orchard

company 0.063 brand 0.041 client 0.038

corporation 0.033 tree 0.028

business 0.028 computer 0.027

crop 0.027 software company 0.022 computer company 0.021

Context-dependent conceptualization

Concept of Jordan

Jordan and Basketball

Jordan and Iraq

country 0.172 state 0.107 place 0.088

arab state 0.070 arab country 0.067

muslim country 0.052 arab nation 0.045

middle eastern country 0.042 islamic country 0.040

regime 0.023

place 0.284 player 0.240 team 0.177

nation 0.106 host country 0.041

professional athlete 0.021 great player 0.020 role model 0.020

shoe 0.018 offensive 0.016

country 0.172 state 0.107 place 0.088

arab state 0.070 arab country 0.067

muslim country 0.052 arab nation 0.045

middle eastern country 0.042 islamic country 0.040

regime 0.023

Experiments and Results

¤ Frame elements

¤ Word similarity in context

¤ Query-ad clickthrough

Experiments and Results

¤ Frame elements ¤ Background: Semantic role labeling depends heavily on

annotated data such as FrameNet

¤ Problem: Building FrameNet requires expertise, and while FrameNet contains 170k annotated sentences, it lacks coverage

¤ Approach: Expand FrameNet using CDC

1.  Conceptualize the frame elements given a sentence as the context

2.  Find other instances given the most probable concepts

¤ Experiment: Compare likelihood of frame elements in unseen sentences in FrameNet

Frame elements

Given sentence :

in the I cook them oven

1.  What is the frame of this sentence ? 1)  abusing 2) closure 3) apply_heat

Frame elements

in the I cook them oven

Given sentence :

1.  What is the frame of this sentence ? 1)  abusing 2) closure 3) apply_heat

2.  What is the frame element of the word ‘oven’ 1) cooker 2) food 3) heat_source

Frame elements

inthe I cook them oven

FE: Cooker FE: Food

FE: Heat source

Frame: Apply_Heat

Lexical Unit (Target)

Final Goal :

FE (Frame Element)

Frame elements: conceptualization for expansion

Frame Element : Heat_Source

… egg and chips was sizzling over camp-fires. … the pig sizzled on the flames , spitting fat … a large black kettle was sizzling on the hob. Droplets of coffee sizzled on the hotplate. … kitchen the meat sizzled in the oven and a big pan of potatoes … … sizzled, now and then, upon the diminutive stove

☞ Conceptualize labeled frame elements with context

Labeled elements

Frame elements: conceptualization for expansion

Concept of Heat_Source FE

Extended Heat_Source FE with Probase :

Frame elements: experiment

Per-word heldout log-likelihood of the predicted frame elements using five-fold validation. The naïve approach is conceptualization using Probase without context (Song, IJCAI 2012).

Experiments and Results ¤  Frame elements

¤ Word similarity in context ¤  Background: Recent work in word similarity prediction uses

annotated data of words in sentential context ¤  Problem: Existing methods for word similarity are specifically

tailored for word similarity only. Naïve conceptualization does not consider sentential context.

¤  Approach 1.  Given two words and their sentential contexts, conceptualize

the words 2.  Estimate the similarity using cosine similarity of the concept

vectors ¤  Experiment: Compare the correlation between CDC-based

similarity and human judgment

Word similarity in context

¤ … Native Chinese cuisine makes frequent use of Asian leafy vegetables like bok choy and kai-lan and puts a greater emphasis on fresh meat …

¤ … American Chinese food is usually less pungent than authentic cuisine …

¤ Human evaluation = 9.2 (0~10 scale)

Word similarity in context

¤  ... This system would be implemented into the national response plan for bioweapons attacks in the Netherlands . Researchers at Ben Gurion University in Israel are developing a different device called the BioPen , essentially a “Lab-in-a-Pen” …

¤ … originally written in 1969 and performed extensively at the time by an Israeli military performing group , has become one of the anthems of the Israeli peace camp . During the Arab uprising known as the First Intifada , Israeli singer Si Heyman sang “Yorim VeBokhim” …

¤ Human evaluation = 8.1 (0~10 scale)

Word similarity in context: Results

Note: State-of-the-art word similarity method yields correlation of 0.66 (Huang ACL 2012)

Experiments and Results ¤  Frame elements

¤ Word similarity in context

¤ Query-ad clickthrough ¤  Background: Matching ads with user queries is an important but

difficult task. Clickthrough rate for sponsored links is generally very low.

¤  Problem: Ad bids and user queries are short sequences of keywords that do not benefit from full NLP techniques. But simple keyword expansion methods are inaccurate.

¤  Approach: Use CDC for both ad bids and queries and match them using cosine similarity of the concept vectors.

¤  Experiment: Using search results of Bing, compare the correlation of query-ad concept similarity and CTR.