Text Mining in Social Network

Customer Opinions Analysis for Starbucks in Yelp

Web Analytics, Fall 2014

Professor Yilu Zhou

ISGB 7978

Team:

Yixi Zhang, Xiaoshan Jin, Yi Chun Chien, Yi Ting Kao

Agenda

1. Problem Statement

2. Project Design

3. Stage 1 - Analytic Pre-define

4. Stage 2 - Unstructured Data Analysis

• Correlation Analysis (Overall rating)

• How Rating Differs from Location (Overall rating)

• Feature Selection (Low rating)

• Python Feature Counts Algorithm (Low rating)

• Definition of Top Bad Performance Areas (Low rating)

• Analytics – Manhattan Visualization (Low rating)

5. Analytics Summary & Recommendation

2

3

Problem Statement

There are 212 Starbucks stores in Manhattan. The average rating on Yelp is 2.8 stars. Some comments have Low rating with 1~2 star.

Project Goal:

Find out the factors causing Starbucks stores’ bad performance to ensure the highest level of customer satisfaction.

4

Project Design

5

Stage 1 - Analytic Pre-define

• Platform & Tool Selection: Python, Content Analyzer and JMP

• Data collection:

• Use Python to craw 176 Starbucks stores in Yelp

• Variables: Store location, user location, user comment, user rating

• Reviews Distribution:

- Total review number: 3052

- Average Rating: 2.8

- 74% customers from NY;

26% customers from other places

• Pre-define Complaints Categories

• Product, Service, Waiting-time & Environment

User Rating User Location

Rev

iew

#

6

Stage 2 - Correlation Analysis (Overall rating)

• Target variable: User Rating Group :High (4,5stars) vs. Low (1,2stars)

• Independent Variable: Store Area, User Location, Comment Length

• Use Goodness-of-Fit Test to see correlation between target and independent variables

- Comment Length &

Store location correlated

to User Rating Group

SignificantSignificant

7

Stage 2: How Rating Differs from Location (Overall Rating)

Review # Rating

Why Midtown East is better than Midtown West

When both area have similar numbers of review and

user location?

• Top 3 Bad Areas:

• Lower East Side

• Greenwich Village and SOHO

• Chelsea and Clinton

• Top 3 Good Areas:

• Central Park and Murray Hill

• Lower Manhattan

• Inwood and Washington Heights

Low

Rating

> 62%

High

Rating

> 52%

8

Stage 2 - Feature Selection (Low rating)

Assumptions:

1) All comments from Low rating only talk about negative opinions about Starbucks;

2) An index for each feature is set as Features counts numbers/Bad Comments numbers to every zip code in order to compare features based on zip code level.

Content Analyzer output cleansing: Stop Words and Word Stemming.

Finalized Feature list:

Product – (coffee, drink, drinks, cup, latte, tea, iced, milk, food, wrong)

Waiting time – (time, line, minutes, long, wait, slow, waiting, busy)

Environment – (bathroom, small, clean, seating)

Service – (people, service, staff, barista, baristas, rude, cashier, manager, friendly, attitude)

9

Stage 2 - Python Feature Counts Algorithm: (Low rating)

Calculation rule:

Any feature occurrence in the feature lists labels as “1”. Otherwise, “0”.

•Assess every user review by Product,

Service, Waiting time, and Environment

features;

• Group all of the feature counts based on

store location(Zip Code) .

10

Stage 2 – Definition of Top Bad Performance Areas (Low rating)

Definition Rules(%)

Environment Complaint Product Complaint Service Complaint Waiting time Complaint

Index Range 10.71-

60

Index Range 46.43-

100

Index Range 43.75-

100

Index Range 44.44-

100

Index Median 35.36 Index Median 73.21 Index Median 71.88 Index Median 72.22

Top Bad

Performance

Index Point

35 Top Bad

Performance

Index Point

85 Top Bad

Performance

Index Point

85 Top Bad Performance

Index Point

65

11

Analytics Summary

Manhattan Top Bad Performance Areas

Environment

Complaint

Product Complaint Service Complaint Waiting Complaint

Upper West Side Lower East Side Lower Manhattan Central Park and

Murray Hill

Chelsea and Clinton Central Park and

Murray Hill

Upper East Side Chelsea and Clinton

Greenwich Village and

Soho

Upper East Side Inwood and

Washington

Heights

Lower Manhattan

N/A Inwood and Washington

Heights

N/A Inwood and

Washington Heights

N/A Central Harlem N/A

12

Analytics – Manhattan Visualization (Low rating)

Environment Complaint Product Complaint

13

Analytics – Manhattan Visualization (Low rating)

Waiting Time Complaint Service Complaint

Midtown East has lower “Service

Complaints” rate than Midtown

West

14

Recommendations

To Manager of Manhattan area:

1. The common concerns for customers in all Manhattanarea are long waiting time and bad service.

• Hire more cashiers and baristas based on each store’ssituation (financially efficient)

• Train current employees to provide more professional,flexible and efficient services in a high quality.

• Establish an awards and penalty system for employees.(Attitude, Efficiency)

2. Give priority to areas with high number of reviews butrelative Low rating. E.g. downtown, west midtown

15

Recommendations

3. Each zip code area should try to improve the top three concerns of the customers no matter what the overall rating it get.

E.g. Inwood and Washington Heights

16

Thank You

Q&A

17

Appendix 1 Manhattan Zip Code

18

Appendix 2 Content Analyzer Output One

19

Appendix 3 Content Analyzer Output Two

Date post:	22-Jul-2015
Category:	Science
Upload:	yi-chun-nancy-chien
View:	321 times
Download:	1 times

Text Mining in Social Network

Science