Data Science: Can weather predict Bikeshare usage?

Post on 05-Dec-2014

209 views 1 download

description

A project for the UMD data science course using freely available data from Capital Bikeshare.

transcript

Mother Nature’s Impact on Bike Ridership

Jackie Zajac

Kays Fattal

Naumaan Nasir

Does weather have a relationship with bike ridership?

Can we predict bike usage based on weather?

INTRODUCTION

• Our team

• Research questions

• Picking datasets

• Our audience

METHODOLOGY

• Why linear regression?

• How we manipulated the data

• MySQL engine aggregated 3M table into sum of rental counts and duration

• Mashed up with 731 rows of weather data (2011, 2012)

• Added a Year field• Tools: Excel, MySQL database,

R (Rattle)

METHODOLOGY

• Picking our best configuration

• Categoric vs. numeric variables• Must decide how to measure bike usage • Must pick best variables

• Error analysis

PHASE I

• Began with a broad study of six regressions

• Two target variables (rental counts, duration)• Three temperature measures• Minimum, Average, Maximum• Chunked the day into three time ranges to reflect

temperature during bike rides• Evaluated multiple weather variables’ affect on

regressions

• Ignored Date field

Plots

PHASE II

• Combining the data sets

• Picking best variables:

• Bike rental counts as sole target variable• Maximum temperature • Utilized date/year field • Switched Snow to categoric variable

• Analyzed and refined our regression

• Higher accuracy – R-squared = .8374 or 83.74%

MSE and R-squared• A measure of accuracy in one dataset

predicting another• Relationship between R-squared and MSE

X X

X

FINAL MODELWeight Variable

-4004.501 Intercept

62.118 Maximum Temperature

-132.741 Average Wind

93.162 Precipitation

416.818 Visibility

2063.069 Year

-161.038 Snow [0.0-1.2] inches

-4.945 Snow [1.2-2.0] inches

-588.349 Snow [2.0-3.1] inches

-5.390 Snow [3.1-3.9] inches

Y=

LESSONS LEARNED

• Too many independent variables to incorporate crime dataset in addition to weather dataset

• Means Squared Error (MSE), R-squared

• Only two years’ worth of data was available due to Bikeshare’s short history (2011, 2012)

• Final model would be even more accurate with additional historical data

CONCLUSION

• Our hypotheses proved true: weather does affect bike ridership

• Why is Maximum Temperature better?

• Why does the Year improve accuracy?

• The categorical range of snow inches

QUESTIONS?

Thanks!