Jocelyn Mara Discipline of Sport and Exercise Science€¦ · Part 2 Jocelyn Mara Discipline of...

Post on 18-Jan-2021

2 views 0 download

transcript

Probability in SportPart 2

Jocelyn MaraDiscipline of Sport and Exercise Science

Law of Total Probabilitye.g. 2013 MLB St Louis Cardinals

Severini, 2015

Frequency Percentage

Total wins 97 59.9

Total losses 65 41.1

Wins at home 54 66.7 (out of 81)

Wins away 43 53.1 (out of 81)

Total games 162 -

Law of Total Probability

P(W) = 0.599

P(W|H) = 0.667

P(W | Not H) = 0.531

Severini, 2015

Law of Total Probability

• The number of home games and away games is the same so…

• Their overall winning % is the same as the average of their home and

away winning %

0.599 = (0.667 + 0.531) / 2

Severini, 2015

Law of Total Probability

• In probability notation this is..

P(W) = P(H) P(W|H) + P(not H) P(W | not H)

0.599 = (0.5)(0.667) + (0.5)(0.531)

Severini, 2015

Law of Total Probability

• Unconditional probability (e.g. P(W)) can be expressed as a weighted

average of the conditional probabilities of A given B (e.g. P(W | H)), and A

given not B (e.g. P(W | not H))

• The weights depend on the probability of B (e.g. P(H) and P(not H))

• The weights were equal in the Cardinals example…

• … but this isn’t always the case

Severini, 2015

Law of Total Probabilitye.g. Pitchers Josh Beckett and Johan Santana in 2009

Severini, 2015

Batting Average Against (BAA) Josh Beckett Johan Santana

Overall 0.2441 0.2438

Against right handers 0.226 0.235

Against left handers 0.258 0.267

Law of Total Probabilitye.g. Pitchers Josh Beckett and Johan Santana in 2009

Severini, 2015

Batting Average Against (BAA) Josh Beckett Johan Santana

Overall 0.2441 0.2438

Against right handers 0.226 0.235

Against left handers 0.258 0.267

Law of Total Probability

• Can be explained by Beckett and Santana faced different proportions of

right and left-handed batters

Severini, 2015

Law of Total Probability

P(H) = P(R) P(H | R) + P(L) P(H | L)

WhereH = pitcher allows a hitR = batter is right handedL = batter is left handed

Severini, 2015

Law of Total Probability

e.g. Josh Beckett

0.244 = (0.432) (0.226) + (0.568) (0.258)

Severini, 2015

Proportion of pitches against right handers

Proportion of pitches against left handers

Law of Total Probability

e.g. Johan Santana

0.244 = (0.719) (0.235) + (0.281) (0.267)

Severini, 2015

Proportion of pitches against right handers

Proportion of pitches against left handers

Law of Total Probability

• Beckett was better than Santana vs both right and left handed batters

• But Santana faced more right handed batters than Beckett

• Using conditional probabilities provides more information about relative

performance than unconditional probabilities

Severini, 2015

Adjusting Sports Statistics

• What would Beckett’s overall BAA have been if he faced the same

proportion of right hand batters as Santana?

BAA = (0.432) (0.226) + (0.568) (0.258) = 0.244

Adj. BAA = (0.719) (0.226) + (0.281) (0.258) = 0.235

Severini, 2015

Adjusting Sports Statistics

• In 2009 MLB 56% of all at-bats were from right-handers

Santana’s BAA = (0.719) (0.235) + (0.281) (0.267) = 0.244

Adj. BAA = (0.56) (0.235) + (0.44) (0.267) = 0.249

Severini, 2015

Adjusting Sports Statistics

• LA Lakers 50 matches into the 2017-2018 NBA

Severini, 2015

Frequency Percentage

Total wins 23 46.0

Total losses 27 54.0

Wins at home 11 50.0 (out of 22)

Wins away 12 42.9 (out of 28)

Total games 50

Adjusting Sport Statistics

P(W) = P(H) P(W | H) + P(A) P(W | A)

0.46 = (0.44) (0.50) + (0.56) (0.429)

Adj. P(W) = (0.50) (0.50) + (0.50) (0.429)

= 0.465

Severini, 2015

Adjusting Sport Statistics

• Formally this is known as subclassification adjustment

• AKA direct adjustment

P(W) = P(H) P(W | H) + P(A) P(W | A)

Severini, 2015

These are the subclasses

Adjusting Sport Statistics

• Formally this is known as subclassification adjustment

• AKA direct adjustment

P(W) = P(H) P(W | H) + P(A) P(W | A)

Severini, 2015

These are the subclass weights

Adjusting Sport Statistics

• Subclassification adjustment is not just restricted to probabilities

• Can be used whenever the measurement of interest can be calculated

when applying weights to the subclass measurements

Severini, 2015

Adjusting Sport Statistics

Y = q1Y1 + q2Y2 + q3Y3 ……. + qmYm

Y = measurement of interest

qx = subclass weights

Yx = subclass measurements

m = number of subclass measurements

Severini, 2015

Adjusting Sport Statistics

Y* = p1Y1 + p2Y2 + p3Y3 ……. + pmYm

Y* = predicted measurement of interest under circumstances that weights are px

px = adjusted weights

Yx = subclass measurements

m = number of subclass measurements

Severini, 2015

Adjusting Sport Statistics

• This type of adjustment is appropriate for rates or counts

• Not appropriate for ratios of summary statistics

• Some metrics are already adjusted

• Choose standard weights with objectivity

• Might not be realistic as they represent a hypothetical scenario

Severini, 2015

Adjusting Sport Statistics

• e.g. Canberra United FC Goal Scoring in the W-League

Shots Goals Goal Prob.

All 215 23 0.107

0-10m 46 12 0.261

11 - 20m 78 10 0.128

21 - 30m 80 0 0

> 30m 11 1 0.091

Adjusting Sport Statistics

• e.g. Melbourne City Goal Scoring in the W-League

Shots Goals Goal Prob.

All 192 30 0.156

0-10m 45 11 0.244

11 - 20m 92 17 0.185

21 - 30m 48 1 0.021

> 30m 7 1 0.143

Adjusting Sport Statistics

G = d10G10 + d20 G20 + d30 G30 + dgt30Ggt30

G = Goal Probability

dx = proportion of shots from each distance threshold

Gx = Goal Probability for each distance threshold

Adjusting Sport Statistics

Canberra United

0.107 = (0.214)(0.261) + (0.363)(0.128) + (0.372)(0) + (0.051)(0.091)

Melbourne City

0.156 = (0.234)(0.244) + (0.479)(0.185) + (0.25)(0.021) + (0.036)(0.143)

Adjusting Sport Statistics

Canberra United adj. for Melbourne City Standard

0.126 = (0.234)(0.261) + (0.479)(0.128) + (0.25)(0) + (0.036)(0.091)

Melbourne City

0.156 = (0.234)(0.244) + (0.479)(0.185) + (0.25)(0.021) + (0.036)(0.143)

Adjusting Sport Statistics

• e.g. Goal Scoring in the W-League

Shots Goals Prob.

All 921 90 0.098

0-10m 166 36 0.217

11 - 20m 390 44 0.113

21 - 30m 314 4 0.013

> 30m 51 6 0.118

Adjusting Sport Statistics

G = d10G10 + d20 G20 + d30 G30 + dgt30Ggt30

0.098 = (0.180)(0.217) + (0.423)(0.113) + (0.341)(0.013) + (0.055)(0.118)

Adjusting Sport Statistics

Canberra United Adj. for League Standard

0.106 = (0.180)(0.261) + (0.423)(0.128) + (0.341)(0) + (0.055)(0.091)

Melbourne City Adj. for League Standard

0.137 = (0.180)(0.244) + (0.423)(0.185) + (0.341)(0.021) + (0.055)(0.143)

Z-scores

• Can be used to standardise and compare performances

• Expresses an individual performance value as the number of standard

deviations it is above or below the mean

Z-scores

! = # − %&

Where # = individual value% = mean& = standard deviation

Z-scores

Z-scores

> (38.5 - mean(lakers$FG.)) / sd(lakers$FG.)

[1] -1.543456

> (53.0 - mean(lakers$FG.)) / sd(lakers$FG.)

[1] 1.304611

Z-scores

> scale(lakers$FG.)

[,1]

[1,] -0.8756333

[2,] -0.7774241

[3,] -1.5434559

[4,] 0.6171466

[5,] 1.3046111 ## first 5 rows of output ##

Z-scores

> library(dplyr)

lakers <- mutate(lakers, FG.z = scale(FG.))

Standard Normal Distribution

Standard Normal Distribution

68%

Standard Normal Distribution

95%

Standard Normal Distribution

99%

Standard Normal Distribution

a P(- a < Z < a)

0.5 0.383

1 0.683

1.5 0.866

2 0.954

3 0.997

Normal Distribution

> plot(density(lakers$FG.)

> mean(lakers$FG.)

[1] 46.358

> sd(lakers$FG.)

[1] 5.091172

Normal Distribution

> plot(density(lakers$FG.)

> mean(lakers$FG.)

[1] 46.358

> sd(lakers$FG.)

[1] 5.091172

68%

Comparing Performances

Player Year Receiving Yards

Calvin Johnson 2012 1964

Marvin Harrison 2002 1722

Jerry Rice 1995 1848

John Jefferson 1980 1340

Otis Taylor 1971 1110

Raymond Berry 1960 1289

Severini, 2015

Top Receiving Yard Performances in Different NFL Eras

Comparing Performances

• Direct comparison might be misleading as the passing game has changed

• We can account for these differences by comparing each receiver to other

receivers that played in that season

Severini, 2015

Comparing Performances

Year Mean SD Z-Score

2012 269.2 329.6 5.142

2002 291.2 325.8 4.392

1995 288.4 342.8 3.600

1980 280.5 278.4 3.806

1971 224.3 223.8 3.958

1960 234.0 263.9 4.032

Severini, 2015

Mean and SD for all players in each year

Adjusting Sport Statistics

Player Calvin Johnson Marvin Harrison

Year 2012 2002

Receiving Yards 1964 1722

Mean 269.2 291.2

SD 329.6 325.8

Z-score 5.142 4.392

Severini, 2015

Adjusting Sport StatisticsHow would have Marvin Harrison performed in 2012?

Severini, 2015

! = # − 269.2329.6

2012 Mean

2012 SD

4.392 = # − 269.2329.6

Harrison 2002 Z-score

Adjusting Sport StatisticsHow would have Marvin Harrison performed in 2012?

Severini, 2015

4.392 = ' − 269.2329.6

' = 4.392 329.6 + 269.2

' = 1717 (adj. yards for 2012)

Adjusting Sport Statistics

Player Year Yards Adj. Yards

Calvin Johnson 2012 1964 1964

Marvin Harrison 2002 1722 1717

Jerry Rice 1995 1848 1456

John Jefferson 1980 1340 1524

Otis Taylor 1971 1110 1574

Raymond Berry 1960 1289 1598

Severini, 2015

Adjusted Receiving Yards

Summary

• Law of total probability can be used to adjust and predict sport statistics

based on a given scenario

• Z-scores can be used to compare performances to the mean performance

• Z-scores can be used to adjust sport statistics based on a given mean and

sd

• Consider what mean and sd you use as your reference