Date post: | 22-Jul-2015 |
Category: |
Science |
Upload: | yi-chun-nancy-chien |
View: | 321 times |
Download: | 1 times |
Customer Opinions Analysis for Starbucks in Yelp
Web Analytics, Fall 2014
Professor Yilu Zhou
ISGB 7978
Team:
Yixi Zhang, Xiaoshan Jin, Yi Chun Chien, Yi Ting Kao
Agenda
1. Problem Statement
2. Project Design
3. Stage 1 - Analytic Pre-define
4. Stage 2 - Unstructured Data Analysis
• Correlation Analysis (Overall rating)
• How Rating Differs from Location (Overall rating)
• Feature Selection (Low rating)
• Python Feature Counts Algorithm (Low rating)
• Definition of Top Bad Performance Areas (Low rating)
• Analytics – Manhattan Visualization (Low rating)
5. Analytics Summary & Recommendation
2
3
Problem Statement
There are 212 Starbucks stores in Manhattan. The average rating on Yelp is 2.8 stars. Some comments have Low rating with 1~2 star.
Project Goal:
Find out the factors causing Starbucks stores’ bad performance to ensure the highest level of customer satisfaction.
4
Project Design
5
Stage 1 - Analytic Pre-define
• Platform & Tool Selection: Python, Content Analyzer and JMP
• Data collection:
• Use Python to craw 176 Starbucks stores in Yelp
• Variables: Store location, user location, user comment, user rating
• Reviews Distribution:
- Total review number: 3052
- Average Rating: 2.8
- 74% customers from NY;
26% customers from other places
• Pre-define Complaints Categories
• Product, Service, Waiting-time & Environment
User Rating User Location
Rev
iew
#
6
Stage 2 - Correlation Analysis (Overall rating)
• Target variable: User Rating Group :High (4,5stars) vs. Low (1,2stars)
• Independent Variable: Store Area, User Location, Comment Length
• Use Goodness-of-Fit Test to see correlation between target and independent variables
- Comment Length &
Store location correlated
to User Rating Group
SignificantSignificant
7
Stage 2: How Rating Differs from Location (Overall Rating)
Review # Rating
Why Midtown East is better than Midtown West
When both area have similar numbers of review and
user location?
• Top 3 Bad Areas:
• Lower East Side
• Greenwich Village and SOHO
• Chelsea and Clinton
• Top 3 Good Areas:
• Central Park and Murray Hill
• Lower Manhattan
• Inwood and Washington Heights
Low
Rating
> 62%
High
Rating
> 52%
8
Stage 2 - Feature Selection (Low rating)
Assumptions:
1) All comments from Low rating only talk about negative opinions about Starbucks;
2) An index for each feature is set as Features counts numbers/Bad Comments numbers to every zip code in order to compare features based on zip code level.
Content Analyzer output cleansing: Stop Words and Word Stemming.
Finalized Feature list:
Product – (coffee, drink, drinks, cup, latte, tea, iced, milk, food, wrong)
Waiting time – (time, line, minutes, long, wait, slow, waiting, busy)
Environment – (bathroom, small, clean, seating)
Service – (people, service, staff, barista, baristas, rude, cashier, manager, friendly, attitude)
9
Stage 2 - Python Feature Counts Algorithm: (Low rating)
Calculation rule:
Any feature occurrence in the feature lists labels as “1”. Otherwise, “0”.
•Assess every user review by Product,
Service, Waiting time, and Environment
features;
• Group all of the feature counts based on
store location(Zip Code) .
10
Stage 2 – Definition of Top Bad Performance Areas (Low rating)
Definition Rules(%)
Environment Complaint Product Complaint Service Complaint Waiting time Complaint
Index Range 10.71-
60
Index Range 46.43-
100
Index Range 43.75-
100
Index Range 44.44-
100
Index Median 35.36 Index Median 73.21 Index Median 71.88 Index Median 72.22
Top Bad
Performance
Index Point
35 Top Bad
Performance
Index Point
85 Top Bad
Performance
Index Point
85 Top Bad Performance
Index Point
65
11
Analytics Summary
Manhattan Top Bad Performance Areas
Environment
Complaint
Product Complaint Service Complaint Waiting Complaint
Upper West Side Lower East Side Lower Manhattan Central Park and
Murray Hill
Chelsea and Clinton Central Park and
Murray Hill
Upper East Side Chelsea and Clinton
Greenwich Village and
Soho
Upper East Side Inwood and
Washington
Heights
Lower Manhattan
N/A Inwood and Washington
Heights
N/A Inwood and
Washington Heights
N/A Central Harlem N/A
12
Analytics – Manhattan Visualization (Low rating)
Environment Complaint Product Complaint
13
Analytics – Manhattan Visualization (Low rating)
Waiting Time Complaint Service Complaint
Midtown East has lower “Service
Complaints” rate than Midtown
West
14
Recommendations
To Manager of Manhattan area:
1. The common concerns for customers in all Manhattanarea are long waiting time and bad service.
• Hire more cashiers and baristas based on each store’ssituation (financially efficient)
• Train current employees to provide more professional,flexible and efficient services in a high quality.
• Establish an awards and penalty system for employees.(Attitude, Efficiency)
2. Give priority to areas with high number of reviews butrelative Low rating. E.g. downtown, west midtown
15
Recommendations
3. Each zip code area should try to improve the top three concerns of the customers no matter what the overall rating it get.
E.g. Inwood and Washington Heights
16
Thank You
Q&A
17
Appendix 1 Manhattan Zip Code
18
Appendix 2 Content Analyzer Output One
19
Appendix 3 Content Analyzer Output Two