NCAA
(Much) better than lotto tickets:Analytics and NCAA tournament win probabilities
Michael J. Lopez
Assistant Statistics ProfessorSkidmore College@StatsbyLopez
March 2, 2016
NCAA
Introduction
Outline
IntroductionKaggle contestWhat we didLucky or good?
Discussion and AdviceKaggleTraditional poolsSports analytics
NCAA
Introduction
Background
I Single elimination tournament with 64-68 teams.
I $2.5 billion wagered on the tournament in 2012 (Boudway2014; Tsu 2014)
I Reminder: gambling is still illegal in most places
NCAA
Introduction
Scoring formats:
1. Traditional 1:2:4:8:16:32 (Yahoo, ESPN, etc)I Points for picking winners only, done before tournament
2. Kaggle “March Machine Learning Mania”I All games judged on predictive probabilities
NCAA
Introduction
Kaggle contest
Kaggle description
I ∼ 400 entries from 248 teams in 2014 (500 teams in 2015)
I Predict win probability for every possible game (2278contests)
I Only 63 games actually played used in scoring
I Scoring function:
LogLoss = −y × log(y) + (1 − y) × log(1 − y)
where y is the predicted probability of a win and y is theactual outcome (0 or 1).
NCAA
Introduction
Kaggle contest
I We* won this contest in 2014.I I’ll address a few main questions:
I What did we do?I How lucky did we get?I Traditional NCAA poolsI Lessons transferable to sports analytics
*Jointly with Gregory Matthews (Loyola-Chicago)
NCAA
Introduction
What we did
I Two sources of data:I Las Vegas point spread data (Model M1)I Ken Pomeroy efficiency (Pomeroy 2012) ratings (Model M2)
I Why efficiency?
I Logistic regression: outcome variable of 1 for a win and 0 fora loss.
I Model M1: 1 predictorI Spread
I Model M2: 5 predictorsI Offensive efficiency (home, away)I Defensive efficiency (home, away)I Neutral indicator
NCAA
Introduction
What we did
I Find w to minimize LogLoss of w × yM1 + (1−w)× (1− yM2)
I In-sample versus out-of-sample testingI Our submissions:
I S1 = 0.75ym1 + 0.25ym2
I S2 = 0.25ym1 + 0.75ym2 (Winning entry)
NCAA
Introduction
Lucky or good?
How lucky were we?
I We simulated a the tournament 10,000 times with differing“true” underlying win probabilities:
I S1, S2I Mean of top 10 entries.I Mean of all entires.I All games 0.5
I In each simulated tournament we scored all entries andcounted how often we won.
NCAA
Introduction
Lucky or good?
Results, Kaggle tournament
I Given our probabilities as true probabilities:I Each entry ∼ 15% of winningI Each entry ∼ 50% of top-10 finish
I We finished 4th in 2015
NCAA
Discussion and Advice
Outline
IntroductionKaggle contestWhat we didLucky or good?
Discussion and AdviceKaggleTraditional poolsSports analytics
NCAA
Discussion and Advice
Lessons, Kaggle tournament
1. You always need luck
2. 2 prediction models combined together outperform eitheralone
3. Better data ≥ complex models
NCAA
Discussion and Advice
Traditional pools
Lessons, traditional pools
1. Find upset pools - people are idiots pick too passively
2. Game TheoryI Opponents picks known in expectationI Find value: Duke 2010 ( ), Kentucky 2015 (X)I Consider n
3. ToolsI kenpom.com, fivethirtyeight.comI http://www2.isye.gatech.edu/~jsokol/lrmc/I ‘Who picked whom‘
4. P(Loss) > P(Win)
NCAA
Discussion and Advice
Traditional pools
Lessons, traditional pools
1. Find upset pools - people are idiots pick too passively
2. Game TheoryI Opponents picks known in expectationI Find value: Duke 2010 ( ), Kentucky 2015 (X)I Consider n
3. ToolsI kenpom.com, fivethirtyeight.comI http://www2.isye.gatech.edu/~jsokol/lrmc/I ‘Who picked whom‘
4. P(Loss) > P(Win)
NCAA
Discussion and Advice
Traditional pools
Lessons, traditional pools
1. Find upset pools - people are idiots pick too passively
2. Game TheoryI Opponents picks known in expectationI Find value: Duke 2010 ( ), Kentucky 2015 (X)I Consider n
3. ToolsI kenpom.com, fivethirtyeight.comI http://www2.isye.gatech.edu/~jsokol/lrmc/I ‘Who picked whom‘
4. P(Loss) > P(Win)
NCAA
Discussion and Advice
Traditional pools
Lessons, traditional pools
1. Find upset pools - people are idiots pick too passively
2. Game TheoryI Opponents picks known in expectationI Find value: Duke 2010 ( ), Kentucky 2015 (X)I Consider n
3. ToolsI kenpom.com, fivethirtyeight.comI http://www2.isye.gatech.edu/~jsokol/lrmc/I ‘Who picked whom‘
4. P(Loss) > P(Win)
NCAA
Discussion and Advice
Sports analytics
Lessons, sports analytics
1. Research
2. Visualize
3. Practice
4. Share
5. Improve
6. Generalize
Sidebar: R-statistical software
NCAA
Discussion and Advice
Sports analytics
Citations
I Lopez, M and Matthews, G.J. “Building an NCAA men’sbasketball predictive model and quantifying its success.”Journal of Quantitative Analysis in Sports, 11:1 (2015): 5-12.
I Carlin, B. P. 1996. ‘Improved NCAA Basketball TournamentModeling Via Point Spread and Team Strength Information.’The American Statistician 50:3943.
I Pomeroy, K. 2012. Ratings Glossary. URLhttp://bit.ly/1LGb79q (accessed June 1, 2014).