+ All Categories
Home > Data & Analytics > Week 5 presentation

Week 5 presentation

Date post: 18-Aug-2015
Category:
Upload: apyelton
View: 7 times
Download: 0 times
Share this document with a friend
Popular Tags:
15
Can you run faster? Alexis Yelton
Transcript
Page 1: Week 5 presentation

Can you run faster?

Alexis Yelton

Page 2: Week 5 presentation

Runners want to run fasterHow fast can I run a half marathon?

How can I improve on that time?

Page 3: Week 5 presentation

Demo

Page 4: Week 5 presentation

Data from Strava.com

Pace

Time series, demographic, and aggregated running data on 10,000 runners. 1,000 with half-marathon times.

Page 5: Week 5 presentation

Data from Strava.com

Page 6: Week 5 presentation

Analysis

Benchmarking with a linear model 0.73 6.5 min

Reducing number of features

Ensemble partial least squares regression 0.73 6.4 min

5-fold cross-validation

Regression r2

RMSE

Validation:72 runners 0.63 7.2 min

Page 7: Week 5 presentation

About me: Alexis Yelton, MIT postdocChitinase in marine cyanobacteria

Chiti

nase

acti

vity

My first half marathon:1:56:30

Personal best:1:47:56

Page 8: Week 5 presentation

22 Features

Month distance Weight RangeMonth Runs Gender

Month Elevation Rest Days / WeekMonth Pace Fast Days / WeekMonth Time Long Days /Week

6 Month Distance 5K Time6 Month Runs Marathon Time

6 Month Elevation Minimum Pace6 Month Pace Minimum Pace > 2 mi6 Month Time Minimum Pace > 3 mi

Age Range SD Pace

Page 9: Week 5 presentation

Results

Half Marathon Time

Errors vary with half marathon time.A larger data set would allow for better predictions for faster and slower runners.

Page 10: Week 5 presentation

Results

Page 11: Week 5 presentation
Page 12: Week 5 presentation
Page 13: Week 5 presentation

Analysis

Benchmarking with a linear model 0.73 6.5 min

Dealing with collinear features (and reducing number of features)

1. Ensemble partial least squares regression 0.72 6.6 min2. Linear model 0.71 6.7

min3. Lasso regression 0.69 6.8 min4. Ridge regression 0.72 6.7 min5. Random forest regression 0.67 7.1 min

3-fold cross-validation

Regression r2

RMSE

Validation:69 runners 0.63 7.2 min

Page 14: Week 5 presentation

Analysis

Benchmarking with a linear model 0.73 6.5 min

Reducing number of features

1. Ensemble partial least squares regression 0.72 6.6 min2. Linear model 0.71 6.7

min3. Lasso regression 0.69 6.8 min

Other models with these features4. Ridge regression 0.72 6.7 min5. Random forest regression 0.67 7.1 min

3-fold cross-validation

Regression r2

RMSE

Validation:69 runners 0.63 7.2 min

Page 15: Week 5 presentation

Your average pace over the past month is the most important feature by far.

ResultsVariable importance

Increase in node purity

Pace past month

5K time

Pace past month

Rest days

SD pace

Weight

Long days

Age

Gender


Recommended