In Situ Evaluation of Entity Ranking and Opinion Summarization
using
Kavita Ganesan & ChengXiang Zhai
University of Illinois @ Urbana Champaign
www.findilike.com
• Preference – driven search engine– Currently works in hotels domain
– Finds & ranks hotels based on user preferences:
Structured: price, distance
Unstructured: “friendly service”, “clean”, “good views”(Based on existing user reviews) UNIQUE
• Beyond search: Support for analysis of hotels– Opinion summaries
– Tag cloud visualization of reviews
What is findilike?
…What is findilike?
• Developed as part of PhD. Work – new system(Opinion-Driven Decision Support System, UIUC, 2013)
• Tracked ~1000 unique users from Jan - Aug ‘13
– Working on speed & reaching out to more users
Evaluating Review Summarization
Mini Test-bed
• Base code to extend
• Set of sample sentences
• Gold standard summary for those sentences
• ROUGE toolkit to evaluate the results
• Data set based on - Ganesan et. al 2010
Evaluating Entity Ranking
Mini Test-bed
• Base code to extend
• Terrier Index of hotel reviews
• Gold standard ranking of hotels
• Code to generate nDCG scores.
• Raw unindexed data set for reference
Building a new ranking model
Extend Weighting Model
2 Components that can be evaluated through natural user interaction
1
Ranking entities based on unstructured user preferencesOpinion-Based Entity Ranking
(Ganesan & Zhai 2012)
Summarization of reviewsGenerating short phrases summarizing key opinions(Ganesan et. al 2010, 2012)
2
Evaluation of entity ranking
• Retrieval
– Interleave results
Balanced interleaving(T. Joachims, 2002)
Base
DirichletLM
BaseA click indicates preference…
Snapshot of pairwise comparison results for entity ranking
A B CA > CB
(A Better)CB > CA
(B Better)CA = CB > 0
(Tie)CA = CB = 0 Total
DLM Base 30 35 2 5 72
PL2 Base 10 28 3 7 48
… … … … … … …
# Queries B is better
Algorithms DirichletLM,
Base, PL2
# QueriesA is Better
Snapshot of pairwise comparison results for entity ranking
A B CA > CB
(A Better)CB > CA
(B Better)CA = CB > 0
(Tie)CA = CB = 0 Total
DLM Base 30 35 2 5 72
PL2 Base 10 28 3 7 48
… … … … … … …
Base model better & PL2 not
too good
Base model better, but DLM
not too far behind
Evaluation of review summarization
Randomly mix top Nphrases from two
algorithms
More clicks on phrases from Algo1 vs. Algo2 Algo1 better
ALGO1
ALGO2 Monitor click-through on per
entity basis
Submit code
Performance report
Online Performance
A B CA > CB
(A Better)CB > CA
(B Better)
DLM Base 30 35
PL2 Base 10 28
… … … … …
How to submit a new algorithm?
Mini Testbed
Test on mini test bed
Test Data & Gold Standard
Evaluator(nDCG, ROUGE)
Sample Code
Local performance
Write Java based code
Extend existing code
Implementation
More information about evaluation…
eval.findilike.com
Thanks! Questions?
Links
• Evaluation: http://eval.findilike.com
• System: http://hotels.findilike.com/
• Related Papers: kavita-ganesan.com
References
• Ganesan, K. A., C. X. Zhai, and E. Viegas, Micropinion Generation: An Unsupervised Approach to Generating Ultra-Concise Summaries of Opinions, Proceedings of the 21st International Conference on World Wide Web 2012 (WWW '12), 2012.
• Ganesan, K. A., and C. X. Zhai, Opinion-Based Entity Ranking, Information Retrieval, vol. 15, issue 2, 2012
• Ganesan, K. A., C. X. Zhai, and J. Han, Opinosis: A Graph Based Approach to Abstractive Summarization of Highly Redundant Opinions, Proceedings of the 23rd International Conference on Computational Linguistics (COLING '10), 2010.
• T. Joachims. Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’02, NY, 2002.