+ All Categories
Home > Documents > ShootingStars_Powerpoint for march madness

ShootingStars_Powerpoint for march madness

Date post: 22-Jan-2018
Category:
Upload: pritha-sinha
View: 88 times
Download: 0 times
Share this document with a friend
18
MARCH DATA CRUNCH MADNESS The Shooting Stars Nan (Miya) Wang John De Martino Pritha Sinha Armi Thassim 1
Transcript
Page 1: ShootingStars_Powerpoint for march madness

MARCH DATA CRUNCH MADNESS

The Shooting Stars

Nan (Miya) Wang

John De Martino

Pritha Sinha

Armi Thassim

1

Page 2: ShootingStars_Powerpoint for march madness

INTRODUCTION

Background: With 68 college basketball teams competing in a single-elimination tournament, the National Collegiate Athletic Association (NCAA) is played every spring in the US.

Objective: Create an optimized model to predict 2016 NCAA Finals, based on historical regular season data from 2002 to 2015, through applying various machine learning techniques.

Results:

http://shootingstarsnyc.azurewebsites.net/

Above link to our machine learning web API can help you make your own 2016 NCAA Predictions!

2

Page 3: ShootingStars_Powerpoint for march madness

ANALYSIS KPI

Model Performance Evaluation Metrics

Find a set of predictions that minimizes Log loss.

Penalize heavily being simultaneously confident and wrong.

Balance between being too conservative and too confident.

Actual number of games played in the tournament

Predicted probability that team A beats team B

Actual binaryoutcome of each

3

Page 4: ShootingStars_Powerpoint for march madness

ANALYSIS PROCESS

Model Evaluation

4

Page 5: ShootingStars_Powerpoint for march madness

DATA PREPARATION

5

Feature Transformation and Normalization

Rank to ScoreTeam 1 Adjusted Seed = 0.5 + 0.03 *

(Team 2 Seed - Team 1 Seed)

NormalizationMinMax Scaler

Derive differences

Team 1 score of an attribute - Team 2 score of an attribute

Page 6: ShootingStars_Powerpoint for march madness

FEATURE SELECTION

Feature Correlation Heatmap

Feature Distribution Histogram

Correlation and Distribution

6

A few Features have linear Correlation

Most Features are Normal Distributed

Page 7: ShootingStars_Powerpoint for march madness

Importance Plotting and Recursive Elimination

Log Loss for Different Feature Numbers

Feature Importance

FEATURE SELECTION

7

Optimal Number of Feature: 9

● 97 Features to 9 Features

Page 8: ShootingStars_Powerpoint for march madness

PERFORMANCE VALIDATION

Cross Validation and Different Training Size

Grid Searching/Parameter Tuning

Acceptable Model Performance Variation

8

Learning Curve

Overfitting when Training Size under 45%

Partition Size: 50% - 50%

Page 9: ShootingStars_Powerpoint for march madness

PERFORMANCE VALIDATION

Model Fusion RF, GBT and Logistic Regression are Top 3

Majority Voting

Leverage the information gleaned from different methods Minimize the flaws in each model. Increase stability and guarantee accuracy

9

Page 10: ShootingStars_Powerpoint for march madness

PREDICTION REVIEW

Predicted Prob Distribution for 2016 NCAA

Our model keeps more affirmative on “GonnaWin” Teams while holding ambiguous to “Gonna Lose” Teams.

10

Page 11: ShootingStars_Powerpoint for march madness

PREDICTION REVIEW

11

Predicted Round of 32 for 2016 NCAA

Our Model Accurately Predicted 25 out of 32.

Accuracy: 78%

Page 12: ShootingStars_Powerpoint for march madness

PREDICTION REVIEW

12

Our Model Accurately Predicted 12 out of 16.

Accuracy: 75%

Predicted Sweet 16 for 2016 NCAA

Page 13: ShootingStars_Powerpoint for march madness

PREDICTION REVIEW

13

Predicted Elite Eight for 2016 NCAA

Our Model Accurately Predicted 6 out of 8.

Accuracy: 75%

Page 14: ShootingStars_Powerpoint for march madness

INTERESTING ANALYSIS

14

Top Teams and Cinderella Teams

Top Eight Teams from 2002 to 2015

Detailed performance of eight top teams in each season ?

Page 15: ShootingStars_Powerpoint for march madness

INTERESTING ANALYSIS

15

Eight Top Teams

UNC Michigan St.

ConnecticutKansas Kentucky Duke

LouisvilleFlorida

Championship Count:1. Connecticut(3 times)2. Duke; UNC; Florida(twice)3. Kansas; Kentucky; Louisville(once)

Years Count:1. Kansas(12 years)2. Duke; UNC; Kentucky(11 years)3. Florida; Michigan St.(10 years)

No Championship: Michigan St.

Page 16: ShootingStars_Powerpoint for march madness

INTERESTING ANALYSIS

16

Top Teams and Cinderella Teams

Most Frequent “Cinderella” from 2002 to 2015

We define: In each game, a winning team with higher seed and lower RPI, as Cinderella

Top Teams being Cinderella: Michigan St. Connecticut Kentucky

Page 17: ShootingStars_Powerpoint for march madness

INTERESTING ANALYSIS

17

Cinderella Teams

We define: In each game, a winning team with higher seed and lower RPI, as Cinderella

Model Prediction for Cinderella

Our model accurately identified all Cinderella.

Mean Score: 80%

Page 18: ShootingStars_Powerpoint for march madness

CONCLUSION

Self Attribute(importance descending)

offensive efficiency defensive efficiency block shots Opponent Attribute 2 point field goals shooting 3 point field goals shooting

On Training Dataset: Log_loss: 0.46 Accuracy: 81%

On 2016 Testing Dataset: Accuracy: 75%-78%

Primary Factors for Win-Lose:

Model Accuracy

18

Outer Factor distance

Useful Indicator RPI seed


Recommended