Sports Statistics:First Class Meeting
Dale Zimmerman
January 20, 2017
1
First things
• Go through syllabus
• Discuss assignment
2
Some important dates in the history of sports statis-tics
• 1952 — First sports statistics paper published in a major statisicsjournal (Frederick Mosteller’s analysis of World Series data inJASA)
• 1992 — Creation of the Statistics in Sports section within theASA
• 1994 — September issue of JASA devoted entirely to sportsstatistics
• 2005 — Inaugural issue published of the ASA journal Journalof Quantitative Analysis in Sports
3
Some sports statistics books
• Anthology of Statistics in Sports, (2005), Albert, J., Bennett,J., and Cochran, J.J. (eds.), Philadephia: SIAM.
• Analyzing Baseball Data with R, (2014), Marchi, M. and Al-bert, J. Boca Raton, FL: CRC Press.
• Handbook of Statistical Methods and Analyses in Sports, (2016),Albert, J., Glickman, M.E., Swartz, T.B., and Koning, R.H.(eds.), Boca Raton, FL: Chapman & Hall/CRC.(https://www.crcpress.com/Handbook-of-Statistical-Methods-and-Analyses-in-Sports/Albert-Glickman-Swartz-Koning/p/book/9781498737364)
4
Some interesting sports statistics graphics
1. Score progression plots
• Not that illuminating for most major team sports (football,baseball, soccer, hockey)
• First proposed for use in basketball by P.H. Westfall, “Graph-ical presentation of a basketball game,” The American Statis-tician, 44:305–307 (1990).
• Westfall’s idea was to display the difference in score as thegame progresses.
• Example: Iowa’s men’s basketball team first 6 losses in the2014-15 season (next page):
5
6
Iowa vs Texas
Elapsed Time (in minutes)
Iow
a
Le
ad
−2
0−
10
01
0
0 5 10 15 20 25 30 35 40
−2
0−
10
01
0
Iowa vs Syracuse
Elapsed Time (in minutes)
Iow
a
Le
ad
−2
0−
10
−5
05
0 5 10 15 20 25 30 35 40
−2
0−
10
−5
05
Iowa vs Iowa State
Elapsed Time (in minutes)
Iow
a
Le
ad
−3
0−
20
−1
00
5
0 5 10 15 20 25 30 35 40
−3
0−
20
−1
00
5
Iowa vs UNI
Elapsed Time (in minutes)
Iow
a
Le
ad
−1
5−
50
51
0
0 5 10 15 20 25 30 35 40
−1
5−
50
51
0
Iowa vs Michigan State
Elapsed Time (in minutes)
Iow
a
Le
ad
−2
0−
10
05
10
0 5 10 15 20 25 30 35 40
−2
0−
10
05
10
Iowa vs Wisconsin
Elapsed Time (in minutes)
Iow
a
Le
ad
−3
0−
20
−1
00
0 5 10 15 20 25 30 35 40
−3
0−
20
−1
00
7
• Some websites now display scores of the two teams in cdf-likeform for the game.
• There appears to be no published statistical analysis yet ofthese types of plots. However, the following paper is relevant:Gabel, A. and Redner, S., “Random walk picture of basket-ball scoring,” Journal of Quantitative Analysis in Sports, 8(1),1416 (2012).
• A functional principal component analysis to uncover the dom-inant pattern of curves for a given team in a given season couldbe interesting.
8
2. Continous-time win probability plots
• Common now for NFL, NCAA football, and NBA games.
• Go to
http://live.advancednflstats.com/index.php?gameid1=2015020100
to see how the win probability evolved over time in the 2015Super Bowl between the Seattle Seahawks and the New Eng-land Patriots.
• We will learn one way these plots may be constructed (viarandom forests) in a few weeks.
9
3. Basketball shot charts
10
• It appears that one statistical analysis of shot chart data hasbeen published: “A spatial analysis of basketball shot chartdata,” by B.J. Reich, J. Hodges, B.P. Carlin, and A.M. Reich,The American Statistician, 60:3–12 (2006).
• The authors analyze the shot chart for Sam Cassell, a guardfor the Minnesota Timberwolves, in 2003–2004. They reducethe shot chart to counts (and probabilities of made shots) in an11×11 grid in polar coordinates, plus the semicircle within 2feet of the basket.
11
• Authors’ objective: Understand how several covariates (whetherat home or away, behind or ahead, 1st half or 2nd half, . . . ) af-fect Cassell’s shot success.
• Because some of the 122 cells have ≤ 4 shot attempts, bor-rowing strength from (smoothing over) neighboring cells isimportant for obtaining stable estimates of covariate effects.
12
Shot success: Model
• Shooting percentage was modeled with separate logistic re-gressions at each of the 122 cells.
• The logit of the success probability was modeled as a linearfunction of the covariates, and the regression coefficients wereallowed to vary over cells.
• A spatial model known as a CAR (Conditional AutoRegres-sive model) prior was placed on the regression coefficients,which smooths them spatially.
13
Shot locations: Results
• Significant covariate effects: NOKG (whether Kevin Garnettwas not in the game), NOKG×NOLS (interaction of NOKGand whether Latrell Sprewell was not in the game), TEAM-FGA (whether the Timberwolves took more than 80 shots thatgame)
14
4. Baseball spray charts
15
16
• A google search suggests that no statistical analysis of spraycharts has been published yet.
• Last year, Emily Eck did her Creative Component on this topicunder my supervision.
• She analyzed data from all hitters (combined) for the the en-tire 2014 MLB season. Among several other findings, shefound that the distribution of balls put into play by RH battersis significantly different from (a) that of LH batters (no sur-prise there!), and (b) the mirror image of that of LH batters(somewhat surprising).
17
5. Baseball pitch location plots
Next two slides: Locations of balls and called strikes for all 370,673called pitches from the 2014 MLB season (and postseason). Therule-book strike zone (which is a rectangle) is superimposed on bothplots.
I will present a detailed analysis of the called strike zone in a fewweeks.
18
19
20
Reading assignment for next week
• The paper by Gabel and Redner cited on page 7.
• “The home-court advantage in basketball: How large is it, anddoes it vary from team to team?” by D.A. Harville and M.H.Smith, The American Statistician, 48:22–28 (1994).
21