Post on 05-Dec-2014
description
transcript
Hiearchical Aspect-Sentiment Model & Context-Dependent Conceptualization
Alice Oh alice.oh@kaist.edu http://uilab.kaist.ac.kr/ April 11, 2013
Overview
¤ Hierarchical Aspect-Sentiment Model (AAAI 2013) ¤ Suin Kim, et al.
¤ Collaboration with Microsoft Research Asia
¤ Context-Dependent Conceptualization (IJCAI 2013) ¤ Dongwoo Kim, Haixun Wang, Alice Oh
¤ Collaboration with Microsoft Research Asia
2
Users & Information Lab @ KAIST
3
Hiearchical Aspect-Sentiment Model (AAAI-13) Suin Kim, Jianwen Zhang, Zheng Chen, Alice Oh, and Shixia Liu
4
Hierarchical aspect-sentiment model ¤ Goal: To discover a hierarchy of aspects and associated
sentiments from a corpus of online reviews
¤ Assumptions ¤ Each sentence expresses a single aspect and a single sentiment ¤ An aspect (e.g., “battery life”) consists of neutral, positive, and
negative words
¤ Model: A hierarchical aspect-sentiment joint model using the recursive Chinese restaurant processes (rCRP)
¤ Results ¤ A reasonable hierarchy of aspects discovered without supervision ¤ Sentiment classification accuracy comparable other recent
sentiment-aspect joint models
5
Aspect-sentiment hierarchy
6
Goals • To discover and organize the aspects and associated sentiments into a hierarchy • To determine the aspect in each sentence • To determine the sentiment of each sentence
Comparison to other models 7
General Specific
Positive
Neutral
Negative
ASUM & JST
Multigrain Topic Model
General
Specific
Positive Negative
Reverse JST
Hierarchical Aspect-Sentiment Model
General Specific
Positive
Neutral
Negative
ASUM & JST
Multigrain Topic Model
General
Specific
Positive Negative
Reverse JST
Hierarchical Aspect-Sentiment Model
General Specific
Positive
Neutral
Negative
ASUM & JST
Multigrain Topic Model
General
Specific
Positive Negative
Reverse JST
Hierarchical Aspect-Sentiment Model
General Specific
Positive
Neutral
Negative
ASUM & JST
Multigrain Topic Model
General
Specific
Positive Negative
Reverse JST
Hierarchical Aspect-Sentiment Model
General Specific
Positive
Neutral
Negative
ASUM & JST
Multigrain Topic Model
General
Specific
Positive Negative
Reverse JST
Hierarchical Aspect-Sentiment Model
HASM
8
Aspect-sentiment hierarchy
9
• Aspects tend to be general near the root and specific toward the leaves • Each aspect node consists of positive and negative polarity • Each sentence in a review is generated from a single aspect and sentiment • Each word in a sentence is either neutral or subjective
“The screen is clear and the picture quality is outstanding.”
“The screen is clear and the picture quality is outstanding.”
the screen is and the picture
clear quality outstanding
“A short battery life undermines portability.”
“A short battery life undermines portability.”
A battery life portability
short undermines
HASM: Experiments & Results
¤ Data: Amazon reviews on laptops (10,014) and DSLRs (20,862)
¤ Aspect-sentiment hierarchies
¤ Quantitative evaluation ¤ Topic specialization
¤ Hierarchical affinity
¤ Aspect-sentiment consistency
¤ Fine-grained sentiment classification
¤ User scenario
19
20
Topic specialization
Evaluates the general-to-specific nature of the hierarchy by comparing the average distance of the aspect nodes from the root at each tree depth
Hierarchical affinity
Measures whether a parent-child pair shows smaller distance compared to a non-parent-child pair, one at level L and another at level L+1
Aspect-sentiment consistency
Measures how in-node topics are statistically coherent by comparing • average intra-node topic distance • average inter-node topic distance ttt
ttt
ttt
ttt ttt
Sentiment classification accuracy
• Sentiment classification using short (<100 characters) reviews
• Small set contains positive reviews of 5 stars, negative reviews of 1 star
• Large set contains positive reviews of 4~5 stars, negative reviews of 1~2 stars
User scenario
Visualization of hierarchical aspect-sentiments for a user who is looking for a camera with good picture quality under low lights, a good LCD screen, and high-end lenses
Context-dependent Conceptualization (IJCAI 2013) Dongwoo Kim, Haixun Wang, Alice Oh
26
Semantic relatedness
Apple reveals new iPad
Microsoft introduces Surface
Surface vs iPad
Samsung’s new android tablets
iPhone 5, the best smart phone ever
By Topic Modeling
iPad Apple
Microsoft iPhone
Software Samsung
SmartPhone Android
Software Company iOS
Mobile Phones
Contextual relatedness
Apple reveals new iPad
Fruit Company
Food Fresh fruit Fruit tree
Brand Crop
Flavor Item
Manufacturer
Device Platform Technology Mobile device Tablet Portable device Tablet computer Gadget Apple product Output device
Conceptualization given semantic context
Apple reveals new iPad
Fruit Company
Food Fresh fruit Fruit tree
Brand Crop
Flavor Item
Manufacturer
Device Platform Technology Mobile device Tablet Portable device Tablet computer Gadget Apple product Output device
iPad
A
pple
M
icro
soft
iP
hone
So
ftw
are
Sam
sung
Sm
artP
hone
A
ndro
id
Soft
war
e C
ompa
ny
iOS
Mob
ile P
hone
s Semantic Context of Sentence
Concept of Apple Concept of iPad
Conceptualization given semantic context
Apple reveals new iPad
Fruit Company
Food Fresh fruit Fruit tree
Brand Crop
Flavor Item
Manufacturer
Device Platform Technology Mobile device Tablet Portable device Tablet computer Gadget Apple product Output device
iPad
A
pple
M
icro
soft
iP
hone
So
ftw
are
Sam
sung
Sm
artP
hone
A
ndro
id
Soft
war
e C
ompa
ny
iOS
Mob
ile P
hone
s Semantic Context of Sentence
Concept of Apple Concept of iPad
Reinforcing concepts Based on context
Fruit Company
Food Fresh fruit Fruit tree
Brand Crop
Flavor Item
Manufacturer
Context-dependent conceptualization
company 0.104 client 0.078 tree 0.069
corporation 0.050 computer 0.047
software company 0.041 oems 0.025 laptop 0.020
personal computer 0.019 host 0.019
Concept of Apple
Apple and iPad
fruit 0.039 food 0.035
company 0.026 brand 0.024 flavor 0.021 crop 0.020 juice 0.018
fresh fruit 0.017 plant 0.017 snack 0.015
Apple and Orchard
company 0.063 brand 0.041 client 0.038
corporation 0.033 tree 0.028
business 0.028 computer 0.027
crop 0.027 software company 0.022 computer company 0.021
Context-dependent conceptualization
Concept of Jordan
Jordan and Basketball
Jordan and Iraq
country 0.172 state 0.107 place 0.088
arab state 0.070 arab country 0.067
muslim country 0.052 arab nation 0.045
middle eastern country 0.042 islamic country 0.040
regime 0.023
place 0.284 player 0.240 team 0.177
nation 0.106 host country 0.041
professional athlete 0.021 great player 0.020 role model 0.020
shoe 0.018 offensive 0.016
country 0.172 state 0.107 place 0.088
arab state 0.070 arab country 0.067
muslim country 0.052 arab nation 0.045
middle eastern country 0.042 islamic country 0.040
regime 0.023
Experiments and Results
¤ Frame elements
¤ Word similarity in context
¤ Query-ad clickthrough
Experiments and Results
¤ Frame elements ¤ Background: Semantic role labeling depends heavily on
annotated data such as FrameNet
¤ Problem: Building FrameNet requires expertise, and while FrameNet contains 170k annotated sentences, it lacks coverage
¤ Approach: Expand FrameNet using CDC
1. Conceptualize the frame elements given a sentence as the context
2. Find other instances given the most probable concepts
¤ Experiment: Compare likelihood of frame elements in unseen sentences in FrameNet
Frame elements
Given sentence :
in the I cook them oven
1. What is the frame of this sentence ? 1) abusing 2) closure 3) apply_heat
Frame elements
in the I cook them oven
Given sentence :
1. What is the frame of this sentence ? 1) abusing 2) closure 3) apply_heat
2. What is the frame element of the word ‘oven’ 1) cooker 2) food 3) heat_source
Frame elements
inthe I cook them oven
FE: Cooker FE: Food
FE: Heat source
Frame: Apply_Heat
Lexical Unit (Target)
Final Goal :
FE (Frame Element)
Frame elements: conceptualization for expansion
Frame Element : Heat_Source
… egg and chips was sizzling over camp-fires. … the pig sizzled on the flames , spitting fat … a large black kettle was sizzling on the hob. Droplets of coffee sizzled on the hotplate. … kitchen the meat sizzled in the oven and a big pan of potatoes … … sizzled, now and then, upon the diminutive stove
☞ Conceptualize labeled frame elements with context
Labeled elements
Frame elements: conceptualization for expansion
Concept of Heat_Source FE
Extended Heat_Source FE with Probase :
Frame elements: experiment
Per-word heldout log-likelihood of the predicted frame elements using five-fold validation. The naïve approach is conceptualization using Probase without context (Song, IJCAI 2012).
Experiments and Results ¤ Frame elements
¤ Word similarity in context ¤ Background: Recent work in word similarity prediction uses
annotated data of words in sentential context ¤ Problem: Existing methods for word similarity are specifically
tailored for word similarity only. Naïve conceptualization does not consider sentential context.
¤ Approach 1. Given two words and their sentential contexts, conceptualize
the words 2. Estimate the similarity using cosine similarity of the concept
vectors ¤ Experiment: Compare the correlation between CDC-based
similarity and human judgment
Word similarity in context
¤ … Native Chinese cuisine makes frequent use of Asian leafy vegetables like bok choy and kai-lan and puts a greater emphasis on fresh meat …
¤ … American Chinese food is usually less pungent than authentic cuisine …
¤ Human evaluation = 9.2 (0~10 scale)
Word similarity in context
¤ ... This system would be implemented into the national response plan for bioweapons attacks in the Netherlands . Researchers at Ben Gurion University in Israel are developing a different device called the BioPen , essentially a “Lab-in-a-Pen” …
¤ … originally written in 1969 and performed extensively at the time by an Israeli military performing group , has become one of the anthems of the Israeli peace camp . During the Arab uprising known as the First Intifada , Israeli singer Si Heyman sang “Yorim VeBokhim” …
¤ Human evaluation = 8.1 (0~10 scale)
Word similarity in context: Results
Note: State-of-the-art word similarity method yields correlation of 0.66 (Huang ACL 2012)
Experiments and Results ¤ Frame elements
¤ Word similarity in context
¤ Query-ad clickthrough ¤ Background: Matching ads with user queries is an important but
difficult task. Clickthrough rate for sponsored links is generally very low.
¤ Problem: Ad bids and user queries are short sequences of keywords that do not benefit from full NLP techniques. But simple keyword expansion methods are inaccurate.
¤ Approach: Use CDC for both ad bids and queries and match them using cosine similarity of the concept vectors.
¤ Experiment: Using search results of Bing, compare the correlation of query-ad concept similarity and CTR.
Sponsored link bid keywords
Bid keywords for sponsored links= { Rockport, Shoes }
User Query = { Rockport men shoes }
Show sponsored links when bid keywords and query match!
Query-ad clickthrough
Ad-bids Query CTR rockport shoes rockport men boots 0.0201 rockport shoes florsheim shoes 0.0022 rockport shoes men dockers shoes 0.0000 replica watches breitling copy watches 0.0833 replica watches replica 0.0833 replica watches tiffany replica bracelet 0.0064
free email e mail 0.0454 free email windows mail 0.0294 free email set up free email account 0.0232
Equal weighting phrase conceptualization
company 0.366 brand 0.255 town 0.183 shoe 0.071
shoe company 0.058 neighboring town 0.054
popular name brand 0.010 top brand 3.49E-08
popular brand 3.01E-08 top name 2.38E-08
Bid keywords for sponsored links= { }
accessory 0.092 clothes 0.051
equipment 0.049 essential 0.045 garment 0.045
shoe 0.042 fashion accessory 0.034
touch 0.033 textile 0.029 surface 0.029
CDC
How to combine two CDC results?
Rockport,
CDC
Shoes
URL title and Query Conceptualization
User Query = { Bayesian Topic Model }
Title of this page { Latent Dirichlet allocation – Wikipedia, the free encyclopedia }
Retrieve web pages based on concept similarities between URL-title and query
IDF Weighting Phrase Conceptualization
Title of Web page { Latent Dirichlet allocation – Wikipedia, the free encyclopedia }
User Query = { Bayesian Topic Model }
Are these important concepts for retrieval?
How to combine CDC results of query and title?
Correlation between CTR and avg. similarity
CDC achieves higher correlations between average similarity and CTR
Model Correlation
CDC-IDF-100 CDC-IDF-200 CDC-IDF-300
0.818 0.827 0.838
CDC-EQ-100 CDC-EQ-200 CDC-EQ-300
0.932 0.952 0.955
Keyword IJCAI 11
0.259 0.243