+ All Categories
Home > Engineering > Data Science Academy Student Demo day--Divyanka Sharma, Businesses in nyc

Data Science Academy Student Demo day--Divyanka Sharma, Businesses in nyc

Date post: 05-Dec-2014
Category:
Upload: vivian-s-zhang
View: 357 times
Download: 2 times
Share this document with a friend
Description:
Data Science Academy, Student Demo day, Data science by R, Vivian S. Zhang, see www.nycdatascience.com for more details.
15
Businesses in NYC What types of businesses are found in the city? By: Divyanka Sharma
Transcript
Page 1: Data Science Academy Student Demo day--Divyanka Sharma, Businesses in nyc

Businesses in NYCWhat types of businesses are found in the city?

By: Divyanka Sharma

Page 2: Data Science Academy Student Demo day--Divyanka Sharma, Businesses in nyc

Aim To understand what types of businesses

are found in New York City What are the concentrations, according

to frequency, in each ZCTA? Can we compare neighborhoods? What types of business should I open?

Page 3: Data Science Academy Student Demo day--Divyanka Sharma, Businesses in nyc

Terms and Data Used ZCTA: Zip Code Tabulation Area. These

are conversions of zip codes for easier data analysis. Small differences but mostly the same as zip codes. Only NYC ZCTAs used.

NAICS codes: North American Industry Classification System. These are codes that define the industry that businesses fall under

Page 4: Data Science Academy Student Demo day--Divyanka Sharma, Businesses in nyc

Data Sources Used ZCTA: downloaded from census bureau NAICS codes: dataset bought from Dun

and Bradstreet, a data provider. This contains the names of all businesses, their NAICS codes, Zip codes, and other top level information, for the entire United States. This was bought by my company.

Page 5: Data Science Academy Student Demo day--Divyanka Sharma, Businesses in nyc

Cleaning the data First step is to extract only NYC data

from the US file Convert zip codes to ZCTA’s for easy

comparison. Also useful if want to run more tests using other census info later.

Attach descriptions of NAICS code id #’s to the dataset for readability of data

Page 6: Data Science Academy Student Demo day--Divyanka Sharma, Businesses in nyc

What do we find? The top 10 most common businesses,

by frequency of physical outlets, are the following:

Page 7: Data Science Academy Student Demo day--Divyanka Sharma, Businesses in nyc
Page 8: Data Science Academy Student Demo day--Divyanka Sharma, Businesses in nyc

Example plots of businesses in certain ZCTAs

Page 9: Data Science Academy Student Demo day--Divyanka Sharma, Businesses in nyc

Queens

Page 10: Data Science Academy Student Demo day--Divyanka Sharma, Businesses in nyc

Manhattan

Page 11: Data Science Academy Student Demo day--Divyanka Sharma, Businesses in nyc

Brooklyn

Page 12: Data Science Academy Student Demo day--Divyanka Sharma, Businesses in nyc

Queens

Page 13: Data Science Academy Student Demo day--Divyanka Sharma, Businesses in nyc

The Bronx

Page 14: Data Science Academy Student Demo day--Divyanka Sharma, Businesses in nyc

Problems with the data The data is from 2012, so there could be

some changes The NAICS codes themselves are not

very clear. Example: “all other businesses” category

This is self reported data, so there can be biases

Page 15: Data Science Academy Student Demo day--Divyanka Sharma, Businesses in nyc

Future Potential Can layer other information on top of

this to study more trends Can analyze what businesses an

entrepreneur should look into starting in certain ZCTAs

If time series data available, plot the change in frequency of businesses


Recommended