+ All Categories
Home > Documents > Big data Presention (1)

Big data Presention (1)

Date post: 07-Apr-2017
Category:
Upload: wei-chen
View: 38 times
Download: 0 times
Share this document with a friend
13
YELP DATA ANALYSIS TEAM 3 WEI CHEN, RITVIK NANDIPATI, SHIVA RAKSHITH, KAVISHA SHAH, JIALING ZHANG 1
Transcript
Page 1: Big data Presention (1)

Y E L P DATA A N A LY S I ST E A M 3

W E I C H E N , R I T V I K N A N D I P A T I , S H I V A R A K S H I T H , K A V I S H A S H A H , J I A L I N G Z H A N G

1

Page 2: Big data Presention (1)

AGENDAIntroduction of Project

Business Questions to Address

Data UnderstandingAnalysis (Exploratory, Logistic, Recommendation)

Dashboard Demonstration

Conclusion 2

Page 3: Big data Presention (1)

INTRODUCTIONUse big data tools in conjunction with others

to drive insights from the data to help the business make better strategic

decisions. Founded in 2004 to help people find great local businesses Publishes crowd-sourced reviews about local businesses,

online reservations and online-food delivery Trains businesses, hosts social events and provides data 3 components - contributors, consumers and local

businesses Revenue sources - selling ads and sponsored listings

Source: Wikipedia, vivial.net3

Page 4: Big data Presention (1)

BUSINESS QUESTIONS TO ADDRESS

4

Role of location in business successPopular business type for given locationsSeasonality trend for various business categories Understand reasons behind good/bad reviewsOptimize business recommendations for users

Page 5: Big data Presention (1)

DATA UNDERSTANDINGOnly using 2 of the 5 datasets: Business and Review

Business:

Review:

Attributes

Business_id Categories City Full_address Hours Latitude Longitude

Name Neighborhoods Open Review_count Stars State Type

Business_id Date Review_id Stars Text Type User_id Votes

5

Page 6: Big data Presention (1)

ROLE OF LOCATION IN BUSINESS SUCCESSloc = businessDF.groupBy('city').count().sort(desc('count'))loc_star = businessDF.groupBy('city','stars').count().sort(desc('count'),desc('stars'))

City Star CountLas Vegas 4 3688Las Vegas 3.5 3584Las Vegas 5 3191Las Vegas 4.5 3029Las Vegas 3 2511Phoenix 4 2208Phoenix 5 2141Phoenix 3.5 2018Phoenix 4.5 1797Las Vegas 2.5 1710Phoenix 3 1446 6

Las Ve

gas

Phoen

ix

Scott

sdale

Charl

otte

Tempe

Pittsb

urgh

Henders

on

Montré

alMesa

Chan

dler

0

200000

400000

600000

800000

1000000

1200000

Reviews by City

Page 7: Big data Presention (1)

POPULAR BUSINESS TYPE FOR GIVEN LOCATIONScity_cat_count = city_cat.groupBy('city','category').count().sort(desc('count'))

7

city       category              count

Las Vegas Food                  1562Las Vegas

Local Services        898  

Las Vegas

Shopping              866  

Las Vegas Active Life          710  Las Vegas Bars                  690  

Las Vegas

Arts & Entertainment

586  

Las Vegas

Hotels & Travel       564  

Las Vegas

Hair Salons           563  

Las Vegas

Automotive            448  

Las Vegas

Fast Food             445  

city category countPhoenix Food 995Phoenix Local Services 672Phoenix Shopping 571Phoenix Active Life 397Phoenix Mexican 388Phoenix Home Services 381Phoenix Auto Repair 327Phoenix Hotels & Travel 316

Phoenix Health & Medical 306

Phoenix Automotive 302

Page 8: Big data Presention (1)

8

Review BusinessMonth CategoryBusiness

ID City

Assumptions: Number of reviews can be used as a proxy for visits Users review the Business within a reasonable time of their visit

SEASONALITY TREND FOR VARIOUS BUSINESS CATEGORIES

Page 9: Big data Presention (1)

DASHBOARD

9

Page 10: Big data Presention (1)

UNDERSTAND REASONS BEHIND GOOD/BAD REVIEWS

10

Page 11: Big data Presention (1)

11

Emotional: disgust, horribleFood taste: blandWait time: minuteService quality: rude, managPrice: Money, wast

Page 12: Big data Presention (1)

RECOMMENDATION SYSTEMSLeveraged Spark MLlib to create a recommendation engine for all the users

of Yelp platform. It uses Alternative Least Square method for getting the recommendations

Raw data Preparation

12

Page 13: Big data Presention (1)

CONCLUSION What locations are most promising?

How about what categories?

How can business owners staff their employees?

What attributes play important roles in user

reviews? 13


Recommended