+ All Categories
Home > Science > Text Mining in Social Network

Text Mining in Social Network

Date post: 22-Jul-2015
Category:
Upload: yi-chun-nancy-chien
View: 321 times
Download: 1 times
Share this document with a friend
Popular Tags:
19
Customer Opinions Analysis for Starbucks in Yelp Web Analytics, Fall 2014 Professor Yilu Zhou ISGB 7978 Team: Yixi Zhang, Xiaoshan Jin, Yi Chun Chien, Yi Ting Kao
Transcript
Page 1: Text Mining in Social Network

Customer Opinions Analysis for Starbucks in Yelp

Web Analytics, Fall 2014

Professor Yilu Zhou

ISGB 7978

Team:

Yixi Zhang, Xiaoshan Jin, Yi Chun Chien, Yi Ting Kao

Page 2: Text Mining in Social Network

Agenda

1. Problem Statement

2. Project Design

3. Stage 1 - Analytic Pre-define

4. Stage 2 - Unstructured Data Analysis

• Correlation Analysis (Overall rating)

• How Rating Differs from Location (Overall rating)

• Feature Selection (Low rating)

• Python Feature Counts Algorithm (Low rating)

• Definition of Top Bad Performance Areas (Low rating)

• Analytics – Manhattan Visualization (Low rating)

5. Analytics Summary & Recommendation

2

Page 3: Text Mining in Social Network

3

Problem Statement

There are 212 Starbucks stores in Manhattan. The average rating on Yelp is 2.8 stars. Some comments have Low rating with 1~2 star.

Project Goal:

Find out the factors causing Starbucks stores’ bad performance to ensure the highest level of customer satisfaction.

Page 4: Text Mining in Social Network

4

Project Design

Page 5: Text Mining in Social Network

5

Stage 1 - Analytic Pre-define

• Platform & Tool Selection: Python, Content Analyzer and JMP

• Data collection:

• Use Python to craw 176 Starbucks stores in Yelp

• Variables: Store location, user location, user comment, user rating

• Reviews Distribution:

- Total review number: 3052

- Average Rating: 2.8

- 74% customers from NY;

26% customers from other places

• Pre-define Complaints Categories

• Product, Service, Waiting-time & Environment

User Rating User Location

Rev

iew

#

Page 6: Text Mining in Social Network

6

Stage 2 - Correlation Analysis (Overall rating)

• Target variable: User Rating Group :High (4,5stars) vs. Low (1,2stars)

• Independent Variable: Store Area, User Location, Comment Length

• Use Goodness-of-Fit Test to see correlation between target and independent variables

- Comment Length &

Store location correlated

to User Rating Group

SignificantSignificant

Page 7: Text Mining in Social Network

7

Stage 2: How Rating Differs from Location (Overall Rating)

Review # Rating

Why Midtown East is better than Midtown West

When both area have similar numbers of review and

user location?

• Top 3 Bad Areas:

• Lower East Side

• Greenwich Village and SOHO

• Chelsea and Clinton

• Top 3 Good Areas:

• Central Park and Murray Hill

• Lower Manhattan

• Inwood and Washington Heights

Low

Rating

> 62%

High

Rating

> 52%

Page 8: Text Mining in Social Network

8

Stage 2 - Feature Selection (Low rating)

Assumptions:

1) All comments from Low rating only talk about negative opinions about Starbucks;

2) An index for each feature is set as Features counts numbers/Bad Comments numbers to every zip code in order to compare features based on zip code level.

Content Analyzer output cleansing: Stop Words and Word Stemming.

Finalized Feature list:

Product – (coffee, drink, drinks, cup, latte, tea, iced, milk, food, wrong)

Waiting time – (time, line, minutes, long, wait, slow, waiting, busy)

Environment – (bathroom, small, clean, seating)

Service – (people, service, staff, barista, baristas, rude, cashier, manager, friendly, attitude)

Page 9: Text Mining in Social Network

9

Stage 2 - Python Feature Counts Algorithm: (Low rating)

Calculation rule:

Any feature occurrence in the feature lists labels as “1”. Otherwise, “0”.

•Assess every user review by Product,

Service, Waiting time, and Environment

features;

• Group all of the feature counts based on

store location(Zip Code) .

Page 10: Text Mining in Social Network

10

Stage 2 – Definition of Top Bad Performance Areas (Low rating)

Definition Rules(%)

Environment Complaint Product Complaint Service Complaint Waiting time Complaint

Index Range 10.71-

60

Index Range 46.43-

100

Index Range 43.75-

100

Index Range 44.44-

100

Index Median 35.36 Index Median 73.21 Index Median 71.88 Index Median 72.22

Top Bad

Performance

Index Point

35 Top Bad

Performance

Index Point

85 Top Bad

Performance

Index Point

85 Top Bad Performance

Index Point

65

Page 11: Text Mining in Social Network

11

Analytics Summary

Manhattan Top Bad Performance Areas

Environment

Complaint

Product Complaint Service Complaint Waiting Complaint

Upper West Side Lower East Side Lower Manhattan Central Park and

Murray Hill

Chelsea and Clinton Central Park and

Murray Hill

Upper East Side Chelsea and Clinton

Greenwich Village and

Soho

Upper East Side Inwood and

Washington

Heights

Lower Manhattan

N/A Inwood and Washington

Heights

N/A Inwood and

Washington Heights

N/A Central Harlem N/A

Page 12: Text Mining in Social Network

12

Analytics – Manhattan Visualization (Low rating)

Environment Complaint Product Complaint

Page 13: Text Mining in Social Network

13

Analytics – Manhattan Visualization (Low rating)

Waiting Time Complaint Service Complaint

Midtown East has lower “Service

Complaints” rate than Midtown

West

Page 14: Text Mining in Social Network

14

Recommendations

To Manager of Manhattan area:

1. The common concerns for customers in all Manhattanarea are long waiting time and bad service.

• Hire more cashiers and baristas based on each store’ssituation (financially efficient)

• Train current employees to provide more professional,flexible and efficient services in a high quality.

• Establish an awards and penalty system for employees.(Attitude, Efficiency)

2. Give priority to areas with high number of reviews butrelative Low rating. E.g. downtown, west midtown

Page 15: Text Mining in Social Network

15

Recommendations

3. Each zip code area should try to improve the top three concerns of the customers no matter what the overall rating it get.

E.g. Inwood and Washington Heights

Page 16: Text Mining in Social Network

16

Thank You

Q&A

Page 17: Text Mining in Social Network

17

Appendix 1 Manhattan Zip Code

Page 18: Text Mining in Social Network

18

Appendix 2 Content Analyzer Output One

Page 19: Text Mining in Social Network

19

Appendix 3 Content Analyzer Output Two


Recommended