MINING MULTI-FACETED OVERVIEWS OF ARBITRARY TOPICS IN A TEXT COLLECTION Xu Ling, Qiaozhu Mei, ChengXiang Zhai, Bruce Schatz Presented by: Qiaozhu Mei, UIUC 2008.08.25 1 University of Illinois at Urbana-Champaign
Transcript
Slide 1
MINING MULTI-FACETED OVERVIEWS OF ARBITRARY TOPICS IN A TEXT
COLLECTION Xu Ling, Qiaozhu Mei, ChengXiang Zhai, Bruce Schatz
Presented by: Qiaozhu Mei, UIUC 2008.08.25 1 University of Illinois
at Urbana-Champaign
Slide 2
Motivation The common task: mining and extracting information
from a text collection with ad hoc information needs Structured,
faceted summarization Clustering search results Integrating
expert/customer reviews Semi-structured summarization of scientific
literatures Etc. etc. 2 University of Illinois at
Urbana-Champaign
Slide 3
Multifaceted Text Overview Even if relevant information is
found: Too much information 10 3 research papers 10 4 customer
reviews 10 5 web search results Facet2: Design Facet1: Price
Facet3: Driving experience - A multifaceted overview Sentence 1,
Sentence 2, Sentence k, price 0.4 finance 0.3 cheap 0.05 interest
0.05 3 University of Illinois at Urbana-Champaign
Slide 4
Multi-Faceted Overview Mining Unsupervised A topic clustering
problem Limitations: Topics do not necessarily reflect users
preferences Summarizing a topic cluster is still challenging
Supervised A categorization problem with training examples
Limitations: Predefined facets, may not fit the need of a
particular user Only works for a predefined domain and topics
Training examples for each facet are often unavailable What is
missing here? User interactions 4 University of Illinois at
Urbana-Champaign
Slide 5
More Realistic New Setup Allow a user to flexibly describe each
facet with keywords (1-2) Let the user determine what they want
Mine a multi-faceted overview in a semi-supervised way No need of
training examples Technical challenge: how to cast it as a
semi-supervised learning problem 5 University of Illinois at
Urbana-Champaign
Slide 6
Example (1): Consumer vs. Editor FacetsGenerated Overview (10k
customer rev.)Editor's Review (1) Body Styles, Exterior Design Like
the minor exterior styling changes from 2005 to 2006. Tried the
Camry XLE first, nice ride, but lacked a few features i wanted,
like dual zone A/C, and didn't like the wood trim.... Available
trim levels include... The VP provides air conditioning, power
windows... Powertrains Safety Interior Design The interior is
beautiful - I got all of the features and the navigation is
extremely easy to use. Accord's interior is top notch, nice design,
clear gauges, comfy seats, lots of storage space The seating
arrangements are top-notch, and the interior design and materials
quality continue the high- caliber standards... The car's backseat
is among the roomiest in the segment... Driving Impressions Honda
accord 2006 6 University of Illinois at Urbana-Champaign
Slide 7
Example (2): Different Facets FacetsUser InputGenerated
Overview Designdesign, styleLike the minor exterior styling changes
from 2005 to 2006. Accord's interior is top notch, nice design,
clear gauges, comfy seats, lots of storage space Engineengine, fuel
Financefinance, priceWhen I bought it I was amazed at the trim
level for the price. It is extremely fun to drive, fit and finish
is fantastic, the oversteer could easily be corrected, at the
price, it has no peer and is 10k less then a comparable BMW
Safetysafety Drivingcomfort, fun What if the users want an overview
with different facets? 7 University of Illinois at
Urbana-Champaign
Slide 8
Approach Two-stage framework, using probabilistic topic models
Model each facet with a language model (word distribution) Facet
model initialization bootstrapping method to expand the original
facet keywords with additional correlated words in the document
collection Facet model estimation: to guide a generative topic
model with user defined facets Propose probabilistic mixture models
to estimate the word distribution of every facet Meanwhile,
constraining a facet model to be close to the user specification
Generate the overview: apply the estimated facet models to
categorize the sentences into a semi-structured overviews 8
University of Illinois at Urbana-Champaign
Slide 9
Bootstrapped facet model initialization design feature fun
drive comfortable price horsepower smooth performance fuel safety
reliability exterior roof seat cheap engine performance 0.5 fuel
0.5 performance 0.4 fuel 0.3 horsepower 0.05 engine 0.03 smooth
0.03 9 University of Illinois at Urbana-Champaign
Slide 10
Semi-supervised facet model estimation Guide facet model
estimation with Dirichlet Priors . Dirichlet prior, can be
interpreted as pseudo word counts - Initialized distr. 10
University of Illinois at Urbana-Champaign
Slide 11
Semi-supervised facet model estimation Guide facet model
estimation with Regularization the log likelihood of the text
collection propagates the constraint through the entire collection
according to document similarities Constrains the estimated facet
models to close to the initial facet models 11 University of
Illinois at Urbana-Champaign
Slide 12
Experimental Results The gene summarization task in biomedical
literature The car review mining task for online customer reviews
Our proposed system, especially the regularized Topic model, is
quite effective in mining multi-faceted overviews FacetsPriorRegMQR
SI0.440.450.47 GI0.510.470.41 GP0.200.220.20 EL0.220.250.18 MP0.25
0.20 WFPI0.090.190.15 Avg.0.290.310.27 FacetsPriorRegMQR
BS0.1930.2000.174 PP0.2730.2780.207 SF0.2350.2430.208
IF0.3090.3240.294 DI0.3160.3190.264 Avg.0.2650.2730.229 ROUGE-1
Average R scores Precision @5 12 University of Illinois at
Urbana-Champaign
Slide 13
- Please stop by our poster on Tuesday University of Illinois
at Urbana-Champaign