Recognizing Contextual Polarity inPhrase-Level Sentiment Analysis
based on the research paper by Theresa Wilson, Janyce Wiebe and Paul Homann
Seminar Data Analytics IInternational Master’s Program in Data Analytics
Stiftung Universität HildesheimSummer Semester 2018
Sathish kumar Chandrasekaran
12/Jun/18 "Recognizing Contextual Polarity inPhrase-Level Sentiment Analysis" presentation by Sathish Kumar
2
Outline About the Paper
Sentiment Analysis
Goal
Approach
Corpus preparation
Experiments
Related work
Future work
Critiques
Summary
12/Jun/18 "Recognizing Contextual Polarity inPhrase-Level Sentiment Analysis" presentation by Sathish Kumar
3
About the Paper
✔Paper by Authors "Theresa Wilson, Janyce Wiebe and Paul Hoffmann" in Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language processing, Vancouver, October 2005
✔Work supported by National Science Foundation (NSF) and by the Advanced Research and Development Activity (ARDA)
✔Cited by 2656 articles. Latest (2018)
● Inducing a Lexicon of Abusive Words–a Feature-Based Approach
12/Jun/18 "Recognizing Contextual Polarity inPhrase-Level Sentiment Analysis" presentation by Sathish Kumar
4
Sentiment Analysis
✔ Task of identifying positive and negative opinions, emotions, and evaluations
✔ Depth of analysis! Depends on the application
✔ Document level Analysis
✔ Review analysis, identify inflammatory messages
✔ Phrase / sentence level Analysis
✔ multi-perspective question answering and summarization,mining product reviews, etc
12/Jun/18 "Recognizing Contextual Polarity inPhrase-Level Sentiment Analysis" presentation by Sathish Kumar
5
Question Answering
➢Q: What is the international reaction to the re-election of Robert Mugabe as President of Zimbabwe?
African observers generally approved of his victory while western Governments denounced it
➢Another Example:
"We don't hate the sinner," he says, "but we hate the sin."
➢Quite common for two or more sentiments to be expressed within a single sentence
12/Jun/18 "Recognizing Contextual Polarity inPhrase-Level Sentiment Analysis" presentation by Sathish Kumar
6
Motivation
Typical Approach
✔Using a lexicon of positive and negative words. Entries tagged with word's prior polarity
✔ Positive or negative, out of context
Prior polarity in Lexicon
Word Tag
trust
well
reason
reasonable
polluter
pos
pos
pos
pos
neg
12/Jun/18 "Recognizing Contextual Polarity inPhrase-Level Sentiment Analysis" presentation by Sathish Kumar
7
Motivation
Given a context
Philip Clapp, president of the National Environment Trust, sums up well the general thrust of the reaction of environmental movements: " There is no reason at all to believe that the polluters are suddenly going to become reasonable."
Prior polarity in Lexicon
Word Tagtrust
well
reason
reasonable
polluter
pos
pos
pos
pos
neg
12/Jun/18 "Recognizing Contextual Polarity inPhrase-Level Sentiment Analysis" presentation by Sathish Kumar
8
Motivation
Given a context
Philip Clapp, president of the National Environment Trust, sums up well the general thrust of the reaction of environmental movements: " There is no reason at all to believe that the polluters are suddenly going to become reasonable."
Contextual Polarity: A word may appear in a phrase that expresses a different polarity in context
Prior polarity in Lexicon
Word Tagtrust
well
reason
reasonable
polluter
pos
pos
pos
pos
neg
Contextual Polarity
Word Tag
trust
well
reason
reasonable
polluter
neutral
pos
neg
neg
neutral
12/Jun/18 "Recognizing Contextual Polarity inPhrase-Level Sentiment Analysis" presentation by Sathish Kumar
9
Goal of the research
●To automatically distinguish between prior and contextual polarity
● To pinpoint expressions of positive and negative sentiments
● To determine when an opinion is not being expressed by a word or phrase that typically does evoke one
●Focus on understanding which features are important for this task
12/Jun/18 "Recognizing Contextual Polarity inPhrase-Level Sentiment Analysis" presentation by Sathish Kumar
10
Approach
● With a large set of subjectivity clues from lexicon tagged with prior polarity, identify the contextual polarity of the phrases that contain instances of those clues in the corpus
● Two step approach
+ contextual polarity in all sentiment expressions
corpus
Lexicon
Subj clues with Prior polarity
All instances
Step 1
28 feature
Neutral- Polar
classifier
Polar instances
Step 2
10 feature
Polar
classifier
ContextualPolarity ?
12/Jun/18 "Recognizing Contextual Polarity inPhrase-Level Sentiment Analysis" presentation by Sathish Kumar
11
Prior polarity subjectivity lexicon
●Lexicon of over 8000 subjectivity clues [words used to express private states (mental and emotional states like beliefs, emotions, speculations, sentiments)]
●Lexicon compilation:✔List of subjectivity clues with reliability tags [strongsubj = words subj in most contexts; weaksubj] from (Rilof and Wiebe, 2003)
✔Words fom General-Inquirer + and -ve word lists (+ reliability tags)✔New words from dictionary and a thesaurus
Lexicon
Subj clues with Prior polarity
Step 1 Step 2Contextual Polarity ?
12/Jun/18 "Recognizing Contextual Polarity inPhrase-Level Sentiment Analysis" presentation by Sathish Kumar
12
Prior polarity subjectivity lexicon
Exit criteria:●All words in lexicon tagged with:
✔Prior polarity: positive, negative, both, neutral✔Reliability: strongly subjective (strongsubj),weakly subjective (weaksubj)
{Subjectivity lexicon file to be presented}
12/Jun/18 "Recognizing Contextual Polarity inPhrase-Level Sentiment Analysis" presentation by Sathish Kumar
13
Corpus
Corpus preparation ●Multi-Perspective Question Answering (MPQA) Opinion Corpus ●Given a set of subjective expressions (identified from existing annotations in MPQA corpus), contextual polarity annotated
●Instances tagged as direct-subjective (direct reference to private states) and expressive subjective (indirectly express private states) are considered
Step 1 Step 2Contextual Polarity ?
+ contextual polarity in all sentiment expressions
corpus
All instances
12/Jun/18 "Recognizing Contextual Polarity inPhrase-Level Sentiment Analysis" presentation by Sathish Kumar
14
CorpusAnnotation Scheme
●Interpret the whole sentenceThey have not succeeded, and will never Succeed, in breaking the will of this valiant people.
●Mark polarity of subjective expressions as positive, negative, both, or neutral
African observers generally approved of his victory while Western governments denounced it.Besides, politicians refer to good and evilJerome says the hospital feels no different than a hospital in the states.
Agreement Study●Is this annotation reliable?
10 documents with 447 subjective expressionsKappa: 0.72 (82%)
12/Jun/18 "Recognizing Contextual Polarity inPhrase-Level Sentiment Analysis" presentation by Sathish Kumar
15
Corpus *Annotated corpus to be presented
●425 documents (15,991 subj expressions / 8984 sentences) from corpus were annotated
● 28% of these sentences - No subjective expressions (SE)● 25% - Only one SE● 47% (4247 sentences) - two or more SE (including polar and neutral)
●10 fold cross validation experiments
12/Jun/18 "Recognizing Contextual Polarity inPhrase-Level Sentiment Analysis" presentation by Sathish Kumar
16
Gold StandardGiven an instance inst from the lexicon, the classifer of inst is defned as:
if inst not in a subjectve expression:goldclass(inst) = neutral
else if inst in at least one positve and one negatve subjectveexpression:
goldclass(inst) = bothelse if inst in a mixture of negatve and neutral:
goldclass(inst) = negatveelse if inst in a mixture of positve and neutral:
goldclass(inst) = positveelse: goldclass(inst) = contextual polarity of subjectve expression
12/Jun/18 "Recognizing Contextual Polarity inPhrase-Level Sentiment Analysis" presentation by Sathish Kumar
17
Why not use prior polarity alone toidentify contextual polarity!
●Classifier that simply assumes the contextual polarity of a clue instance is same as clue's prior polarity
●Performed on a small amount of data which is not part of the data used in actual experiments
●Accuracy: 48%
12/Jun/18 "Recognizing Contextual Polarity inPhrase-Level Sentiment Analysis" presentation by Sathish Kumar
18
Step-1 Neutral-Polar Classification
Broad classification of features used in step1●Word features●Modification features●Structure features●Sentence features●Document features
corpus
Lexicon
All instances
Step 1
28 feature
Neutral- Polar
classifier
Polar instances
Step 2
ContextualPolarity ?
12/Jun/18 "Recognizing Contextual Polarity inPhrase-Level Sentiment Analysis" presentation by Sathish Kumar
19
Step-1 Neutral-Polar Classification●Word features●Modification features●Structure features●Sentence features●Document features
●From Corpus:Bush's visit to China has been a success.
From Subj Lexicon:
●From bundesliga.com:World champions Germany qualified for the finals, with a perfect 10 wins ...
From Subj Lexicon:
12/Jun/18 "Recognizing Contextual Polarity inPhrase-Level Sentiment Analysis" presentation by Sathish Kumar
20
Step-1 Neutral-Polar Classification●Word features●Modification features●Structure features●Sentence features●Document features
Dependency parse tree
12/Jun/18 "Recognizing Contextual Polarity inPhrase-Level Sentiment Analysis" presentation by Sathish Kumar
21
Step-1 Neutral-Polar Classification●Word features●Modification features●Structure features●Sentence features●Document features
12/Jun/18 "Recognizing Contextual Polarity inPhrase-Level Sentiment Analysis" presentation by Sathish Kumar
22
Step-1 Neutral-Polar Classification●Word features●Modification features●Structure features●Sentence features●Document features
12/Jun/18 "Recognizing Contextual Polarity inPhrase-Level Sentiment Analysis" presentation by Sathish Kumar
23
Results for Step-1
Acc Polar Recall Polar precision
Polar F
Word token 73.6 45.3 72.2 55.7
word+priorpol 74.2 54.3 68.6 60.6
28 features 75.9 56.8 71.6 63.4
Neutral Recall
Neutral precision
Neutral F
Word token 89.9 74.0 81.2
word+priorpol 85.7 76.4 80.7
28 features 87.0 77.7 82.1
12/Jun/18 "Recognizing Contextual Polarity inPhrase-Level Sentiment Analysis" presentation by Sathish Kumar
24
Step-2 Polarity Classification
Broad classification of features used in step2
●Word features
●Polarity features
Ideal case : 3 way classification ( Positive, Negative, both)
Actual case: few neutral cases still remain after Step1. 4 way classification (including neutral)
corpus
Lexicon
Step 1Polar instances 10 feature
polar classifier
ContextualPolarity ?
Step 2
12/Jun/18 "Recognizing Contextual Polarity inPhrase-Level Sentiment Analysis" presentation by Sathish Kumar
25
Step-2 Polarity Classification●Word features [word token and prior polarity as before]●Polarity features Negated:binary
not goodDoes not look very good
Not only good but amazing
Negated subject:binaryNo Politically prudent Israeli could support
either of them
Modifies polarity: 5 valuesPositive, negative,neutral,both,notmod
Substantial:negative (prior polarity of parent)
Modified by polarity: 5 valuesPositive, negative,neutral,both,notmod
challenge:positive (prior polarity of the child)
Conjunction polarity: 5 valuesPositive, negative,neutral,both,notmod
Good: negative and evil: positive
12/Jun/18 "Recognizing Contextual Polarity inPhrase-Level Sentiment Analysis" presentation by Sathish Kumar
26
Step-2 Polarity Classification●Polarity features
General polarity shifter: binarylittle threat
Contains little truth
Negative polarity shifter: binarylack of understanding
Positive polarity shifter: binaryabate the damage
12/Jun/18 "Recognizing Contextual Polarity inPhrase-Level Sentiment Analysis" presentation by Sathish Kumar
27
Results for Step-2
Rec Positive Negative
Rec Prec F Rec Prec F
Word token 61.7 59.3 63.4 61.2 83.9 64.7 73.1
word+priorpol 63.0 69.4 55.3 61.6 80.4 71.2 75.5
28 features 65.7 67.1 63.3 65.1 82.1 72.9 77.2
Both Neutral
Rec Prec F Rec Prec F
Word token 9.2 35.2 14.6 30.2 50.1 37.7
word+priorpol 9.2 35.2 14.6 33.5 51.8 40.7
28 features 11.2 28.4 16.1 41.4 52.4 46.2
12/Jun/18 "Recognizing Contextual Polarity inPhrase-Level Sentiment Analysis" presentation by Sathish Kumar
28
Other experiments● Feature evaluation to their usefulness for the task
➢ Feature ablation experiments
● investigating the importance of recognizing neutral instances (how noise from step1 affect overall performance)
➢ Two sets of polarity classification in step2
● Experiment with all pure polar instances from gold-standard (manually annotated)
● Experiment with using polar instances identified automatically from step1(Neutral-Polar classifier)
12/Jun/18 "Recognizing Contextual Polarity inPhrase-Level Sentiment Analysis" presentation by Sathish Kumar
29
Other experiments● Experimentation with four different types of machine learning classifer (features representation to be changed as per the algorithm)
●Boosting = BoosTexter AdaBoost.MH
●memory-based learning = TiMBL IB1 (k-nearest neighbor)
●rule learning = Ripper
●support vector learning = SVM-light and SVM-multiclass
●SVM-light = binary classification (neutral–polar classification)
●SVM-multiclass = experiments with > two classes
● Two-step versus One-step Recognition of Contextual Polarity
12/Jun/18 "Recognizing Contextual Polarity inPhrase-Level Sentiment Analysis" presentation by Sathish Kumar
30
Related work●Hatzivassiloglou and McKeown, 1997; Kamps and Marx,2002
●learning words and phrases with prior positive or negative polarity●Subjective lexicon based out of this work
●Yu and Hatzivassiloglou (2003), Kim and Hovy (2004), Hu and Liu (2004), and Grefenstette et al.
●Sentence level sentiment analysis by averaging / multiply / count the prior polarities of instances of lexicon words in the sentence. Local negations to reverse polarity
●One sentiment per sentence
●Nasukawa, Yi, and colleagues (Nasukawa and Yi, 2003; Yi et al., 2003)
● Phrase level sentiment analysis (very close)● find sentiment expressions for a given subject and determine the
polarity of the sentiments● Much smaller proportion of expressions in corpus
12/Jun/18 "Recognizing Contextual Polarity inPhrase-Level Sentiment Analysis" presentation by Sathish Kumar
31
Future work●To identify features that represent more complex interdependencies between polarity clues
●To expand the lexicon for acquiring the prior polarity of words (improves the coverage of sentiment expressions)
●Extending the lexicon would add more neutral instances. Hence performance improvement is an empirical question
What has happened with MPQA corpus meanwhile:
New version of "MPQA Opinion Corpus" released in June 2017 by Theresa Wilson. Requested for full text
3.0 eTarget annotationsEg: Imam issued the fatwa against Salman Rushdie for insulting the Prophet
"Event level Sentiment Analysis" by Lingjia Deng and Janyce Wiebe in 2015
2.0 Attitude (arguments, agreements) and target annotations (what each attitude about) added
Theresa Ann Wilson (2008)
1.2 Included polarity as a result of this work
12/Jun/18 "Recognizing Contextual Polarity inPhrase-Level Sentiment Analysis" presentation by Sathish Kumar
32
Critiques●Non performing features
● Little effect in the results even if those features are removed
● Dimensions can be reduced● Author claims that only the combination of all features give a better
performance. Requested to get the latest feature setTo verify Might have been considered meanwhile. Need to check the latest feature set to identify the difference in them
●Not many additional words in the curent version of subjective lexicon as stated as a future work
● Work was done with one word prior polarity from Lexicon. Performance might improve if lexicon extended with commonly used combination of "Subj clues/phrases" subject to test
●Sarcasm – Let's think out loud subject to test
12/Jun/18 "Recognizing Contextual Polarity inPhrase-Level Sentiment Analysis" presentation by Sathish Kumar
33
Summary
●A two-step approach to phrase-level sentiment analysis●Determine if an expression is neutral or polar●Determines contextual polarity of the ones that are polar
●Wide range of features in two stages of classification
●Automatically identify the contextual polarity of a large subset of sentiment expressions
●4 Tags: positive, negative, both or neutral
●Positive and negative words from a lexicon are used in neutral contexts much more often
12/Jun/18 "Recognizing Contextual Polarity inPhrase-Level Sentiment Analysis" presentation by Sathish Kumar
34
References●Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis by Theresa Wilson, Janyce Wiebe and Paul Homann
●Recognizing Contextual Polarity:An Exploration of Features for Phrase-Level Sentiment Analysis by Theresa Wilson, Janyce Wiebe and Paul Homann
●J. Yi, T. Nasukawa, R. Bunescu, and W. Niblack. 2003. Sentiment analyzer: Extracting sentiments about a given topic using natural language processing techniques
●T. Nasukawa and J. Yi. 2003. Sentiment analysis: Capturing favorability using natural language processing.
● http://mpqa.cs.pitt.edu/corpora/mpqa_corpus/
● http://mpqa.cs.pitt.edu/lexicons/subj_lexicon/
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extraction
Wei Jin , Hung Hay Ho, Rohini K. Srihari
Sharmila Ragunathan(279188)
"OpinionMiner: A Novel Machine Learning System for Web opinion mining and extraction" by Sharmila Ragunathan
1
Outline:➢Introduction
➢Objectives
➢System Framework
➢Entity types and tag sets
➢Lexicalized HMMs Integrating POS
➢Information Propagation
➢Token transformation
➢Decoding
➢Opinion Sentence Extraction
➢Determining Opinion Orientation
➢Bootstrapping
➢Evaluation and Results
➢Conclusion
"OpinionMiner: A Novel Machine Learning System for Web opinion mining and extraction" by Sharmila Ragunathan
2
Introduction:
Customers shares their opinions and hands-on experiences on products they have purchased.
Reading through all the reviews is difficult for a potential customer.
This paper aims to design a system that is capable of extracting, learning and classifying product entities and opinion expressions automatically from product reviews
3
Objectives:
How to automatically extract
potential product entities and opinion
entities from the reviews?
How to identify opinion sentences
which describe each extracted product
entity?
How to determine opinion orientation
(positive or negative) given each
recognized product entity?
4
"OpinionMiner: A Novel Machine Learning System for Web opinion mining and extraction" by Sharmila Ragunathan
➢Opinion mining is a type of natural language processing for tracking the mood of the public about a particular product.
➢Sentiment analysis- collect and categorize opinions about a product.
➢Automated opinion mining often uses ML, AI, to Mine text for sentiment.
5
System Framework
"OpinionMiner: A Novel Machine Learning System for Web opinion mining and extraction" by Sharmila Ragunathan 6
Entity Types and Tag Sets
"OpinionMiner: A Novel Machine Learning System for Web opinion mining and extraction" by Sharmila Ragunathan 7
Table 1. Definitions of entity types and examples:
"OpinionMiner: A Novel Machine Learning System for Web opinion mining and extraction" by Sharmila Ragunathan
8
Table 2. Basic tag set and its corresponding entities:
"OpinionMiner: A Novel Machine Learning System for Web opinion mining and extraction" by Sharmila Ragunathan
9
Table 3. Pattern tag set and its corresponding pattern:
"OpinionMiner: A Novel Machine Learning System for Web opinion mining and extraction" by Sharmila Ragunathan
10
Hybrid tag representation:
<tbtp>w1</tbtp>……<tbtp>wn</tbtp>
Where,
wi stands for a single word
Tb represents a basic tag
Tp represents a pattern tag
• This hybrid-tag labeling method is applied to all the training data and system outputs.
"OpinionMiner: A Novel Machine Learning System for Web opinion mining and extraction" by Sharmila Ragunathan
11
Hybrid tag and Basic tag:Example:
“I love the ease of transferring the pictures to my computer.”
• Hybrid tags:
<BG>I</BG><OPINION_POS_EXP>love</OPINION_POS_EXP><BG>the</BG><PROD_FEATBOE>ease</PROD_FEAT-BOE><PROD_FEATMOE>of</PROD_FEAT-MOE><PROD_FEATMOE>transferring</PROD_FEAT-MOE><PROD_FEATMOE>the</PROD_FEAT-MOE><PROD_FEATEOE>pictures</PROD_FEATEOE><BG>to</BG><BG>my</BG><BG>computer</BG>
• Basic tags:
<BG>I</BG><OPINION_POS_EXP>love</OPINION_POS_EXP><BG>the</BG><PROD_FEAT>ease of transferring the pictures</PROD_FEAT><BG>to</BG><BG>my</BG><BG>computer</BG>
"OpinionMiner: A Novel Machine Learning System for Web opinion mining and extraction" by Sharmila Ragunathan
12
Lexicalized HMMs Integrating POS:
• We integrate linguistic features such as part-of-speech and lexical patterns into HMMs
• An observable state is represented by a pair
(wordi, POS(wordi))
Given: W=w1w2w3…wn and S = s1s2s3…sn
• Find an appropriate sequence of hybrid tags
that maximize the conditional probability P(T|W,S)
"OpinionMiner: A Novel Machine Learning System for Web opinion mining and extraction" by Sharmila Ragunathan
13
Information Propagation using Entity’sSynonyms, Antonyms and Related Words
• Automatically propagate information of each expert tagged entity to its synonyms, antonyms, similar words and related words.
• An entity can be a single word or a phrase.
• By expanding each single word to a list of its related words, different word combinations can be formed.
• For example:
“Good picture quality” is an expert tagged opinion sentence
"OpinionMiner: A Novel Machine Learning System for Web opinion mining and extraction" by Sharmila Ragunathan
14
Information propagation
"OpinionMiner: A Novel Machine Learning System for Web opinion mining and extraction" by Sharmila Ragunathan 15
• A dictionary program has been built to return an input word’s synonyms, antonyms, similar words and related words using Microsoft Word’s thesaurus.
• WordNet - we found it returned too many less commonly used synonyms and antonyms.
• However, most reviewers are prone to use commonly used words to express their opinions.
"OpinionMiner: A Novel Machine Learning System for Web opinion mining and extraction" by Sharmila Ragunathan
16
Token Transformations:
• Many entities may be overly specific.
• For example:
➢ “I love its 28mm lens”
➢ “I love its 300mm lens”.
• Both sentences talk about lens. They could be ideally grouped together as
➢ “I love its Xmm lens”
where X can be any numerical value.
"OpinionMiner: A Novel Machine Learning System for Web opinion mining and extraction" by Sharmila Ragunathan
17
• This transformation generalizes the information contained in sentences and is useful in solving the problem of sparseness in the training data.
• In our framework, we use this transformation to handle high detailed information in sentences such as Model Number, Focal Length, and ISO
"OpinionMiner: A Novel Machine Learning System for Web opinion mining and extraction" by Sharmila Ragunathan 18
Decoding
• The decoding algorithm aims at finding the most probable sequence of hybrid tags for a given sequence of known words and corresponding parts of speech
• A hybrid tag of an observable word involves a
• category tag
• pattern tag.
19
Opinion Sentence
Extraction
• Opinion sentences in our work are defined as sentences that express opinions on product related entities.
1. Sentences that describe product related entities without expressing reviewers’ opinions.
2. Sentences that express opinions on another product model’s entities.
20
Determining Opinion
Orientation
• Due to the complexity and flexibility of natural language, opinion orientation is not simply equal to opinion entity (word/phrase)’s orientation.
• For Example:
“I can tell you right now that the auto mode and the program modes are not that good.”
21
• The orientation of the matching opinion entity becomes the initial opinion orientation for the corresponding product entity.
• Next, natural language rules reflecting sentence context are employed to address specific language constructs, such as the presence of negation words (e.g., not, didn’t, don’t) which may change the opinion orientation.
Except-
➢A negation word appears in front of a coordinating conjunction (e.g. and, or, but).
➢A negation word appears after the appearance of a product entity during the backward search within the five-word window.
➢A negation word appears before another negation word.
"OpinionMiner: A Novel Machine Learning System for Web opinion mining and extraction" by Sharmila Ragunathan
22
EXPERIMENTS:
• Amazon’s digital camera
• For each review page, each individual review content, model number as well as manufacturer name were extracted from the HTML documents.
• Sentence segmentation - review documents
• POS parsing
"OpinionMiner: A Novel Machine Learning System for Web opinion mining and extraction" by Sharmila Ragunathan
23
Training Design:
• 1728 review documents
• One set (293 documents for 6 cameras)
Opinion sentences were identified and product entities, opining entities and opinion orientations were manually labeled using the tag sets
• The remaining documents (1435 documents for 10 cameras)
By using Bootstrapping process - To self-learn new vocabularies.
"OpinionMiner: A Novel Machine Learning System for Web opinion mining and extraction" by Sharmila Ragunathan
24
Bootstrapping:• Extracting high confidence data through self-learning.
1.extracting and distributing high confidence data to each worker
2. Splitting - t1 and t2 by random selection. Each half is used as seeds for each worker’s HMM.
3.Train HMM Classifier, tag the documents in the bootstrap document set and get new set of tagged review documents.
4. Extract opinion sentences that are agreed upon by both classifiers - only the identical sentences with identical tags were considered to agree with one another.
"OpinionMiner: A Novel Machine Learning System for Web opinion mining and extraction" by Sharmila Ragunathan
25
5. A hash value is then calculated for each extracted opinion sentence from step 4 and compared with those of sentences already stored in the database
6. If it is a newly discovered sentence, master stores it into the database.
7. Split the data into t1 and t2 and add them to the training set of two workers respectively.
8.Repeat the process until no more new data is being discovered.
"OpinionMiner: A Novel Machine Learning System for Web opinion mining and extraction" by Sharmila Ragunathan
26
Bootstrapping process
"OpinionMiner: A Novel Machine Learning System for Web opinion mining and extraction" by Sharmila Ragunathan 27
Evaluation Process:
• The effectiveness of the proposed framework was evaluated by measuring the recall, precision of extracted entities, opinion sentences and opinion orientations.
➢System Performance: compare the results tagged by the system with the manually tagged truth data.
➢Entity recognition: same word/phrase is identified and classified correctly as one of four pre-defined entity types.
➢Opinion sentence extraction: exact same sentence from the same document is identified compared with the truth data.
➢Opinion orientation classification: same entity and entity type are identified with correct orientation
"OpinionMiner: A Novel Machine Learning System for Web opinion mining and extraction" by Sharmila Ragunathan
28
Evaluation Results and Discussions:
• The proposed machine learning framework performs significantly better than the rule-based baseline system in terms of entity extraction, opinion sentence recognition and opinion polarity classification.
• Effectively extracting frequent entities, the system also excels in identifying important but infrequently mentioned entities, which was under-analyzed or ignored by previously proposed methods.
• we also propose the potential non-noun product entities, such as “engineered” and “operated”. These non-noun entities were ignored by previously proposed approaches which were based on the assumption that product entities must be noun or noun phrases
"OpinionMiner: A Novel Machine Learning System for Web opinion mining and extraction" by Sharmila Ragunathan
29
CONCLUSIONS:
• A novel and robust machine learning system is designed for opinion mining and extraction. The model provides solutions for several problems that have not been addressed by previous approaches.
• The model naturally integrates multiple linguistic features into automatic learning.
• The system can predict new potential product and opinion entities based on the patterns it has learned.
• Complex product entities and opinion expressions as well as infrequently mentioned entities can be effectively and efficiently identified.
• A bootstrapping approach combining active learning.
"OpinionMiner: A Novel Machine Learning System for Web opinion mining and extraction" by Sharmila Ragunathan
30
REFERENCES:➢ [1] Lee, S. Z., Tsujii, J., and Rim, H. C. 2000. Lexicalized Hidden Markov Models for Part-of-Speech Tagging. In Proceedings of the
18th International Conference on Computational Linguistics (COLING'00), 481-487.
➢ [2] Fu, G. and Luke, K. K. 2005. Chinese Named Entity Recognition using Lexicalized HMMs. ACM SIGKDD Explorations Newsletter 7,1 (2005), 19-25.
➢ [3] Turney, P. D. 2002. Thumbs up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL’02), 417-424.
➢ [4] Turney, P. D. and Littman, M. L. 2003. Measuring praise and criticism: Inference of semantic orientation from association. ACM Trans. On Information Systems, 21, 4 (2003), 315-346.
➢ [5] Dave, K., Lawrence, S., and Pennock, D. M. 2003. Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews. In Proceedings of the 12th international conference on World Wide Web (WWW’03), 519-528.
➢ [6] Pang, B., Lee, L., and Vaithyanathan, S. 2002. Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP’02), 79-86.
➢ [7] Pang, B. and Lee, L. 2004. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42th Annual Meeting of the Association for Computational Linguistics (ACL’04), 271-278.
➢ [8] Das, S. and Chen, M. 2001. Yahoo! for Amazon: Extracting market sentiment from stock message boards. In Proceedings of the 8th Asia Pacific Finance Association Annual Conference (APFA’01).
➢ [9] Hu, M. and Liu, B. 2004. Mining and Summarizing Customer Reviews. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’04), 168-177
➢ [10] Zhuang, L., Jing, F., and Zhu, X. 2006. Movie Review Mining and Summarization. In Proceedings of the International Conference on Information and Knowledge Management (CIKM’06), 43-50.
➢ [11] Popescu, A. and Etzioni, O. 2005. Extracting Product Features and Opinions from Reviews. In Proceeding of 2005 Conference on Empirical Methods in Natural Language Processing (EMNLP’05), 339-346.
➢ [12] Ding, X., Liu, B., and Yu, P. S. 2008. A Holistic Lexicon based Approach to Opinion Mining. In Proceeding of the international conference on Web Search and Web Data Mining (WSDM’08), 231-239.
"OpinionMiner: A Novel Machine Learning System for Web opinion mining and extraction" by Sharmila Ragunathan
31
Thank you !!!
"OpinionMiner: A Novel Machine Learning System for Web opinion mining and extraction" by Sharmila Ragunathan 32