Date post: | 25-May-2015 |
Category: |
Technology |
Upload: | nobal-niraula |
View: | 1,203 times |
Download: | 5 times |
1
“Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics”
Authors: Anindya Ghose, Panagiotis G. Ipeirotis, Member, IEEE
Course: Topics in Data miningPresenter: Nobal Niraula
December 8, 2010 @ UOM
2
Introduction Gathering variables (Attributes) Explanatory study using Econometric
Regression◦ Hypothesis for sales◦ Hypothesis for perceived usefulness
Prediction◦ Helpfulness◦ Impact on sales
Conclusion
Outline
3
Product related word-of-mouth conversations in online markets
Reviewers contribute time and energy Volume of review could be high Benefits
◦ Customers: Usefulness / Helpfulness Average Star Rating Bimodel Peer Review Biased Helpfulness = helpful votes / total votes “Spotlight Review” in Amazon.com
◦ Manufacturers: Influence on Sales Helpful reviews are not necessarily the ones that lead to increases
in sales ! Reviews that affect most should be presented first to
manufacturers
Introduction (1)
4
The paper is unique in looking at how subjectivity level, readability and spelling errors in the text of reviews affect product sales and the perceived helpfulness of these reviews.
Introduction (2)
5
Two Level Study◦ Explanatory Econometric Analysis
Identify aspects of a review a reviewer
◦ Prediction Model using “Random Forests” How peer consumers are going to rate a review How sales will be affected by the posted review
Predicting Helpfulness and Importance
6
Product Reviews
7
Sample Review
8
Reviewer’s Profile
9
Product Rank
10
Variables Collection Products
◦ Audio and video players (144 products),◦ Digital cameras (109 products), and◦ DVDs (158 products).
Product and Sales Data :Retail Price, Sales Rank, Average Rating, Number of Reviews, Elapsed Date
Reviewer History: Number of Past Reviews, Reviewer History Micro, Reviewer History Micro, Past Helpful Votes, Past Total Votes
Reviewer Characteristics: Reviewer Rank, Top-10 Reviewer, Top-50 Reviewer, Top-100 Reviewer, Top-500 Reviewer, Real Name, Nick Name, Hobbies, Birthday, Location, Web page, Interests, Snippet, Any Discloser
Individual Review: Moderate Review, Helpful Votes, Total Votes, Helpfulness
Review Readability : Length(Chars), Length (Words), Length(Sentence), Spelling Error, ARI, Gunning Index, Coleman–Liau index, Flesch Reading Ease, Flesch–Kincaid Grade Level, SMOG
Review Subjectivity: AvgProb, DevProb
11
Readability Analysis◦ Automated Readability Index◦ Coleman-Liau Index◦ Flesch-Kincaid Grade Level◦ Gunning fog index◦ SMOG
Subjectivity Analysis◦ Stylistic Choices : “Subjective” vs “Objective”◦ Each document gets a “Subjectivity Score”
AvgProb (r) : High value Many Subjective sentences DevProb (r) : High Value Mixed (Subj+Obj) sentences
Text of a Review Matters !
12
Hypothesis 1a: ◦ All else equal, a change in the subjectivity level and
mixture of objective and subjective statements in reviews will be associated with a change in sales.
Hypothesis 1b: ◦ All else equal, a change in the readability score of
reviews will be associated with a change in sales.
Hypothesis 1c: ◦ All else equal, a decrease in the proportion of spelling
errors in reviews will be positively related to sales.
Hypothesis for Sales
13
ln(D) = a + b * ln(S)◦ D is the unobserved product demand◦ S is its observed sales rank◦ Pareto Distribution◦ High sales rank low demand
Key Observation:◦ “Sales rank” in Amazon.com can be taken as
PROXY of Demand !
Effect on Product Sales
14
Descriptive Statistics for Econometric Analysis
15
Model to test Hypothesis1
μk is a product fixed effect that accounts for unobserved heterogeneity across products and εkt is the error termControl Variables: Retail Price, Avg. Numeric Rating, Elapsed Date, Number of Reviews
16
Empirical Results for Product Sales
Note: 1. (-ve) decrease Sales Rank Increase Sales2. Variables that Increase Sales: AvgProb, Readability, Spelling
Errors3. Variables that Decrease Sales: Retail Price, DevProbAlso: Reviews with Rating < =2 are associated with increased sales
17
Hypothesis 1a:◦ High subjective sentences increase sales◦ Mixture of subjective and objective sentences are negatively associated
with product sales compared to highly subjective and objective sentences.
Hypothesis 1b:◦ Higher readability scores are associated with higher sales
Hypothesis 1c◦ An increase in proportion of spelling mistakes decreases product sales for
some “experience products” like DVDs however the proportion of spelling errors doesn’t have significant impact on sales for “search products”
Reviews with that rate products negatively can be associated with increased product sales when the review text is informative and detailed !!!
Conclusion (1)
18
Hypothesis 2a: ◦ All else equal, a change in the subjectivity level and mixture of
objective and subjective statements in a review will be associated with a change in the perceived helpfulness of that review
Hypothesis 2b: ◦ All else equal, a change in the readability of a review will be
associated with a change the perceived helpfulness of that review.
Hypothesis 2c: ◦ All else equal, a decrease in the proportion of spelling errors in a
review will be positively related to perceived helpfulness of that review.
Hypothesis 2d: ◦ All else equal, an increase in the average helpfulness of a
reviewer’s historical reviews will be positively related to perceived helpfulness of a review posted by that reviewer.
Hypothesis for Helpfulness
19
Effect on Helpfulness
μk is a product fixed effect that controls differences in the average helpfulness of reviews across products and εkt is the error term
20
Empirical Results for Helpfulness
Note:
(-ve) Lower Helpfulness
Negative Relations:AvgProb, Spelling Error, Moderate
Positive Relations: DevProb **, Disclosure, Readability, Reviewer History Macro, Number of Reviews
21
Hypothesis 2a:◦ In general, mixture of subjective and objective elements more
informative (helpful) by the users.◦ For feature-based goods users prefer reviews having more objective
information and less subjective sentences ◦ For experience goods, e.g. DVD, users expect few objective
sentences but more subjective sentences
Hypothesis 2b – 2d : ◦ Increase in the readability of reviews has a positive and statistically
impact on review helpfulness◦ An increase in proportion of spelling errors has a negative and
statistically significant impact review helpfulness for audio-video products and DVDs.
◦ Past historical information about reviewers has a statistically significant effect on the perceived helpfulness of reviews
Conclusion (2)
22
Main goal◦ Is the review informative or not ?◦ Does the review impact on sales or not ?
Question: given a helpfulness value of a review, decide whether it is useful or not◦ Helpfulness = (Helpful votes/ Total votes)◦ Continuous to binary conversion◦ Threshold found is 60 %
Classification◦ Regression Model can be used◦ Binary Classification
Predictive Modeling
23
Classifiers◦ SVM VS Random Forest
SVM consistently performed worse unlike reported in reports
Training time for SVM was significantly higher than that of Random Forest
Predicting Helpfulness (1)
24
Predicting Helpfulness (2)
25
Examining whether the difference SalesRankt(r)+T − SalesRankt(r) where t(r) is the time the review is posted, is positive or negative.
Predicting Impact on Sales
26
Random Forest based prediction◦ For experience goods such as DVDs classifier has
lower performance◦ Observed high correlation of “classification error”
with “distribution of review ratings”◦ Reviews that have received widely fluctuating
ratings also have reviews with widely fluctuating helpfulness votes.
◦ Highly detailed and readable reviews can have low helpfulness votes
◦ “reviewer-related”, “review subjectivity” and “review readability” features sets are interchangeable!
Conclusion (3)
27
Subjectivity level, readability and spelling errors in the text of reviews affect product sales and the perceived helpfulness
Overall Conclusion
28
Anindya Ghose, Panagiotis G. Ipeirotis, "Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics," IEEE Transactions on Knowledge and Data Engineering, vol. 99, no. PrePrints, , 2010
References
29
Thank You !