Post on 09-Aug-2020
transcript
OptimizingBaseball Performance andPlayer Salary
Michael GrenonCS378 Data Mining
Spring 2018
Baseball ♥ Stats
North America ♥ Baseball
Franchise Entertainmentorganization
moneydata
wins
2017*
Optimization
Salary optimizationHow did teams optimize their player salaries?
● No Salary Cap!● Similarity problem
○ Linear correlation○ Pearson correlation coefficient
How to Play to Win?Which aspects of play most strongly correlate with winning?
● Similarity problem○ (Linear) association○ Pearson correlation coefficient
How do the best teams use their players?How frequently are certain players used in games?
● Frequent item set problem○ Apriori algorithm○ Support threshold?
R = 0.253
Price Per Win
E = win% * (ppw rank + win rank)
R = -0.555
BsR ≈ Baserunning WAR
Taken from team_batting(2017)
R = 0.548
wSL ≈ weighted Slider
Taken from team_batting(2017)
R = 0.500
¯\_(ツ)_/¯
What’s next?How frequently are certain players used in games?
● Frequent item set problem○ Apriori algorithm○ Support threshold?
What’s next?To what extent are win-loss record and attendance related?
● Extending Pearson correlation analysis
Preliminary Conclusions
● Hitting coaches: teach how to hit a slider○ Pitch most “cost-efficient” to excel at hitting○ ...but not by much
● Fielding coaches: emphasize speed and skill on baserunning○ More closely associated with salary efficiency than any other performance
■ Even batting, pitching
Preliminary Conclusions
● Chess match: all pieces are important ● 2-sided game● Predictive (vs. descriptive) statistics
○ Time-series analysis
Preliminary Conclusions● Too much data
○ batting_stats(): 287 attributes?● Statcast data
○ More complex mining techniques○ Neural Nets
● Data warehouses incomplete, disorganized○ Private sector