Identifying Comparative Sentences in Text Documents
Nitin Jindal and Bing Liu
University of Illinois
SIGIR 2006
Introduction
• Comparisons are one of the most convincing ways of evaluation.
• Much of such info is available on the Web (customer reviews), forum discussions, and blogs.
• Useful for product manufacturers and potential customers (to make purchasing decisions).
Comparisons vs. Opinions
• Comparisons can be both objective or subjective.
• Comparative sentences have different language constructs from typical opinion sentences.
• Comparative sentences may contain some indicators.
Car X is much better than Car Y
Car X is two feet longer than Car Y
Related Work
• Linguistics: based on grammars (syntax and semantics) and logic (gradability), which is more for human consumption than for automatic identification.
• Opinion tasks: opinion extraction and classification problem, which is quite different from this comparison identification.
Comparatives (Linguistic)
• Comparatives are used to express explicit orderings between objects with respect to the degree or amount to which they possess some gradable property.
John is taller than he was
=>
John is tall to degree d
Comparatives (Linguistic)
• Two broad types:– Metalinguistic Comparatives: compare properti
es of one entity.
Ronaldo is angrier than upset.– Propositional Comparatives: compare between t
wo propositions. Three subcategories:
Comparatives (Propositional)
• Nominal Comparatives: (two sets of entities)
Paul ate more grapes than bananas.
• Adjectival Comparatives: (than, as good as)
Ford is cheaper than Volvo.
• Adverbial Comparatives: (occur after a verb phrase)
Tom ate more quickly than Jane.
Superlatives
• Adjectival Superlatives:
John is the tallest person.
• Adverbial Superlatives:
Jill did her homework most frequently.
• Equality: conjunctions like and, or, …
John and Sue, both like sushi.
POS involved
• NN: Noun• NNP: Proper Noun• VBZ: Verb, present tense, 3rd person singular• JJ: Adjective• RB: Adverb• JJR Adjective, comparatives• JJS: Adjective, superlative• RBR: Adverb, comparative• RBS: Adverb, superlative
Limitations of linguistic classification.
• Non-comparatives with comparative words: many non-comparatives contain comparative words.
In the context of speed, faster means better.John has to try his best to win this game.
• Limited coverage: many comparatives contain no comparative words.
In market capital, Intel is way ahead of Amd.Nokia Samsung, both cell phones perform badly on heat dissipation index.
The M7500 earned a World bench score of 85, whereas Asus A3V posted
a mark of 89.
Enhancements
• First limitation: machine learning methods to distinguish comparatives and non-comparatives.
• Second limitation: – User preferences:
I prefer Intel to Amd = Intel is better than Amd
– Implicit comparatives:Camera X has 2 MP, whereas camera Y has 5 MP.
Types of Comparatives
• Non-Equal Gradable: greater or less than type, including user preferences.
• Equative (Gradable): equal to type• Superlative (Gradable): greater of less than
all others type• Non-Gradable:
– A is similar to B; A has feature F1 while B has F2; A has feature F but B doesn’t
Tasks
• Identifying comparative sentences from a given text data set.
• Extracting comparative relations from sentences. (Mining comparative sentences and relations, AAAI 2006)
Class Sequential Rules with Multiple Minimum Supports
• For sequential pattern mining, patterns to the left and class to the right.
• Select patterns: keywords – POS (JJR, RBR, JJS, RBS) + Words (favor, prefer, win beat, but…) + Phrases (number one, up against)
• The performance of only using keywords are P=32%, R=94%.
Support and Confidence
• Using the minimum support of 20% and minimum confidence of 40%, one of the discovered CSRs is:
Building the Sequence DBthis/DT camera/NN has/VBZ significantly/RB more/JJR noise/NN at/IN iso/NN 100/CD than/IN the/DT nikon/NN 4500/CD
{NN}{VBZ}{RB}{moreJJR}{NN}{IN}{NN} -> comparative
• Sequences which exceeds 60% confidence threshold become rules. Minimum support = 10%.
• 13 Manual rules with conjunctions as whereas/IN, but/CC, however/RB, while/IN, though/IN, although/IN, etc..
Classification Learning
• Machine learning methods:
Feature Set = {X | X is the sequential pattern in
CSR X → y} ∪{Z | Z is the pattern in a manual rule
Z → y}
Data Preparation
• Consumer reviews on products such as digital cameras, DBD players, MP3 players and cellular phones.
• Forum discussions on topics such as Intel vs. AMD, Coke vs. Pepsi, and Microsoft vs. Google.
• News articles on topics such as automobiles, ipods, and soccer vs. football.
Number of Sentences in Data Sets
Experimental Results (1)
Experimental Results (2)
• Review: R low P high -> short sentences, hard to find patterns
• Articles and Forums: R high P low -> long sentences and find patterns too easily or find too many patterns.
Conclusion and Future Work
• Identifying comparative sentences.
• Analyzing different types of comparative sentences.
• Studying how to automatically classify subjective and objective comparisons.