IntroductionModellingOutcomes
Predicting the Rugby World Cup with a Log LinearScore Model
Email: [email protected]
Hargreaves Lansdown
Bristol Data Scientists, October 2019
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
Outline
1 IntroductionMy background data scienceRugby World Cup: what and why?
2 ModellingData, ideas, assumptionsSketch of the modelMethodsModel validation
3 OutcomesWhat does the model tell us?Predictions: group stagePredictions: knock-out
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
My background data scienceRugby World Cup: what and why?
My Background in Data Science
Mathematics BSc from University of the West of England(genetic algorithms in combinatorial optimisation)Bristol Centre for Complexity Sciences, University of BristolPhD on statistical modelling of spike trains (Box, M., Jones,M.W. and Whiteley, N., 2016. A hidden Markov model fordecoding and the analysis of replay in spike trains. Journal ofcomputational neuroscience, 41(3), pp.339-366)Data Scientist at Hargreaves Lansdown
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
My background data scienceRugby World Cup: what and why?
My Background in Data Science
Mathematics BSc from University of the West of England(genetic algorithms in combinatorial optimisation)Bristol Centre for Complexity Sciences, University of BristolPhD on statistical modelling of spike trains (Box, M., Jones,M.W. and Whiteley, N., 2016. A hidden Markov model fordecoding and the analysis of replay in spike trains. Journal ofcomputational neuroscience, 41(3), pp.339-366)Data Scientist at Hargreaves Lansdown
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
My background data scienceRugby World Cup: what and why?
My Background in Data Science
Mathematics BSc from University of the West of England(genetic algorithms in combinatorial optimisation)Bristol Centre for Complexity Sciences, University of BristolPhD on statistical modelling of spike trains (Box, M., Jones,M.W. and Whiteley, N., 2016. A hidden Markov model fordecoding and the analysis of replay in spike trains. Journal ofcomputational neuroscience, 41(3), pp.339-366)Data Scientist at Hargreaves Lansdown
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
My background data scienceRugby World Cup: what and why?
My Background in Data Science
Mathematics BSc from University of the West of England(genetic algorithms in combinatorial optimisation)Bristol Centre for Complexity Sciences, University of BristolPhD on statistical modelling of spike trains (Box, M., Jones,M.W. and Whiteley, N., 2016. A hidden Markov model fordecoding and the analysis of replay in spike trains. Journal ofcomputational neuroscience, 41(3), pp.339-366)Data Scientist at Hargreaves Lansdown
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
My background data scienceRugby World Cup: what and why?
My Background in Data SciencePast attempts at sports modelling and prediction
Horse racingRugbyFootball
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
My background data scienceRugby World Cup: what and why?
My Background in Data SciencePast attempts at sports modelling and prediction
Horse racingRugbyFootball
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
My background data scienceRugby World Cup: what and why?
My Background in Data SciencePast attempts at sports modelling and prediction
Horse racingRugbyFootball
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
My background data scienceRugby World Cup: what and why?
Rugby World CupRugby union - about
Figure: Rugby: Running
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
My background data scienceRugby World Cup: what and why?
Rugby World CupRugby union - about
Figure: Rugby: A try
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
My background data scienceRugby World Cup: what and why?
Rugby World CupRugby union - about
Figure: Rugby: Kicking
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
My background data scienceRugby World Cup: what and why?
Rugby World CupRugby union - about
Figure: Rugby: Tackling
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
My background data scienceRugby World Cup: what and why?
Rugby World CupDetails of the competition
20 teamsTwo stages: group (pool) stage, knockout stage4 pools of 5 teams each (4×
(52)
= 40 matches)4 quarter finals, 2 semi finals, final, third place playoff (8matches)
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
My background data scienceRugby World Cup: what and why?
Rugby World CupDetails of the competition
20 teamsTwo stages: group (pool) stage, knockout stage4 pools of 5 teams each (4×
(52)
= 40 matches)4 quarter finals, 2 semi finals, final, third place playoff (8matches)
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
My background data scienceRugby World Cup: what and why?
Rugby World CupDetails of the competition
20 teamsTwo stages: group (pool) stage, knockout stage4 pools of 5 teams each (4×
(52)
= 40 matches)4 quarter finals, 2 semi finals, final, third place playoff (8matches)
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
My background data scienceRugby World Cup: what and why?
Rugby World CupDetails of the competition
20 teamsTwo stages: group (pool) stage, knockout stage4 pools of 5 teams each (4×
(52)
= 40 matches)4 quarter finals, 2 semi finals, final, third place playoff (8matches)
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
Data, ideas, assumptionsSketch of the modelMethodsModel validation
ModellingData
Match Team Opponent Score Feature_1 Feature_2 Feature_k
1 Wales England 19 2011 1 . . .
1 England Wales 26 2011 0 . . .
2 France Scotland 34 2011 1 . . .
2 Scotland France 21 2011 0 . . .
......
......
......
Table: Example data
Sources: www.scorespro.com/rugby-union/,www.oddsportal.com/rugby-union/
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
Data, ideas, assumptionsSketch of the modelMethodsModel validation
ModellingData
South Africa Tonga Uruguay USA Wales
Namibia New Zealand Russia Samoa Scotland
France Georgia Ireland Italy Japan
Argentina Australia Canada England Fiji
2012 2014 2016 2018 2020 2012 2014 2016 2018 2020 2012 2014 2016 2018 2020 2012 2014 2016 2018 2020 2012 2014 2016 2018 2020
−100
−50
0
50
100
−100
−50
0
50
100
−100
−50
0
50
100
−100
−50
0
50
100
Figure: Scores for and against, all matchesM. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
Data, ideas, assumptionsSketch of the modelMethodsModel validation
ModellingData
South Africa Tonga Uruguay USA Wales
Namibia New Zealand Russia Samoa Scotland
France Georgia Ireland Italy Japan
Argentina Australia Canada England Fiji
2012 2014 2016 2018 2020 2012 2014 2016 2018 2020 2012 2014 2016 2018 2020 2012 2014 2016 2018 2020 2012 2014 2016 2018 2020
−20
0
20
40
−20
0
20
40
−20
0
20
40
−20
0
20
40
Figure: Margin, running mean, all matchesM. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
Data, ideas, assumptionsSketch of the modelMethodsModel validation
ModellingModelling decisions
Boys, R.J. and Philipson, P.M., 2019.On the ranking of test match batsmen.Journal of the Royal Statistical Soci-ety: Series C (Applied Statistics), 68(1),pp.161-179.
Score of team i against opponent j to be the dependentvariable.Circumstances of the match used as predictive features.Want to use past performance as a guide, but make ittime-dependent.
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
Data, ideas, assumptionsSketch of the modelMethodsModel validation
ModellingModelling decisions
Boys, R.J. and Philipson, P.M., 2019.On the ranking of test match batsmen.Journal of the Royal Statistical Soci-ety: Series C (Applied Statistics), 68(1),pp.161-179.
Score of team i against opponent j to be the dependentvariable.Circumstances of the match used as predictive features.Want to use past performance as a guide, but make ittime-dependent.
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
Data, ideas, assumptionsSketch of the modelMethodsModel validation
ModellingModelling decisions
Boys, R.J. and Philipson, P.M., 2019.On the ranking of test match batsmen.Journal of the Royal Statistical Soci-ety: Series C (Applied Statistics), 68(1),pp.161-179.
Score of team i against opponent j to be the dependentvariable.Circumstances of the match used as predictive features.Want to use past performance as a guide, but make ittime-dependent.
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
Data, ideas, assumptionsSketch of the modelMethodsModel validation
ModellingModelling decisions
Boys, R.J. and Philipson, P.M., 2019.On the ranking of test match batsmen.Journal of the Royal Statistical Soci-ety: Series C (Applied Statistics), 68(1),pp.161-179.
Score of team i against opponent j to be the dependentvariable.Circumstances of the match used as predictive features.Want to use past performance as a guide, but make ittime-dependent.
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
Data, ideas, assumptionsSketch of the modelMethodsModel validation
ModellingModelling assumptions
Score of team i in match m is conditionally independent of allscores in all matches given the circumstances of match m.Given the circumstances of match m and the year t(m), scoreof team i in match m is distributed as
Si ,m ∼ Pois (µi ,m)
Further assumptions below.
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
Data, ideas, assumptionsSketch of the modelMethodsModel validation
ModellingModelling assumptions
Score of team i in match m is conditionally independent of allscores in all matches given the circumstances of match m.Given the circumstances of match m and the year t(m), scoreof team i in match m is distributed as
Si ,m ∼ Pois (µi ,m)
Further assumptions below.
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
Data, ideas, assumptionsSketch of the modelMethodsModel validation
ModellingModelling assumptions
Score of team i in match m is conditionally independent of allscores in all matches given the circumstances of match m.Given the circumstances of match m and the year t(m), scoreof team i in match m is distributed as
Si ,m ∼ Pois (µi ,m)
Further assumptions below.
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
Data, ideas, assumptionsSketch of the modelMethodsModel validation
Sketch of the ModelLog-linear mean score model
µi ,m = f (xi ,m, θ) ∈ R
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
Data, ideas, assumptionsSketch of the modelMethodsModel validation
Sketch of the ModelLog-linear mean score model
µi ,m = f (xi ,m, θ) ∈ R
log (µi ,m) = η (xi ,m, θ) ∈ R
log (µi ,m) = ai ,t(m) − bj,t(m) + θi ,1xi ,m − θj,2xj,m
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
Data, ideas, assumptionsSketch of the modelMethodsModel validation
Sketch of the ModelHidden autoregressive process for attack strength, defence strength
ai ,t = αi ,1ai ,t−1 + αi ,2ai ,t−2 + εi ,t ,
εi ,t ∼ N(0, σ2
α
).
bi ,t = βi ,1bi ,t−1 + βi ,2bi ,t−2 + τi ,t ,
τi ,t ∼ N(0, σ2
β
).
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
Data, ideas, assumptionsSketch of the modelMethodsModel validation
Sketch of the ModelHierarchical model
Figure: DAG of the hierarchical model
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
Data, ideas, assumptionsSketch of the modelMethodsModel validation
Fitting the ModelBayesian inference
Sample from posterior using MCMC (adaptiveGibbs-within-Metropolis).Algorithm coded in R.Posterior predictive distribution used for score predictions.
Shi ,m ∼ Pois
(eη(xi,m,θh)
), h = 1, 2, . . .H
Si ,m =1H
H∑h=1
Shi ,m
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
Data, ideas, assumptionsSketch of the modelMethodsModel validation
Fitting the ModelBayesian inference
Sample from posterior using MCMC (adaptiveGibbs-within-Metropolis).Algorithm coded in R.Posterior predictive distribution used for score predictions.
Shi ,m ∼ Pois
(eη(xi,m,θh)
), h = 1, 2, . . .H
Si ,m =1H
H∑h=1
Shi ,m
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
Data, ideas, assumptionsSketch of the modelMethodsModel validation
Fitting the ModelBayesian inference
Sample from posterior using MCMC (adaptiveGibbs-within-Metropolis).Algorithm coded in R.Posterior predictive distribution used for score predictions.
Shi ,m ∼ Pois
(eη(xi,m,θh)
), h = 1, 2, . . .H
Si ,m =1H
H∑h=1
Shi ,m
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
Data, ideas, assumptionsSketch of the modelMethodsModel validation
Model ValidationTraining and posterior predictive checks for validation
Train the model using first T − 1 years’ data. Evaluateperformance using T thyear.Compute posterior mean and posterior quantiles of score.Compare with actual score using mean absolute error (MAE)and look at distributions.
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
Data, ideas, assumptionsSketch of the modelMethodsModel validation
Model ValidationTraining and posterior predictive checks for validation
Train the model using first T − 1 years’ data. Evaluateperformance using T thyear.Compute posterior mean and posterior quantiles of score.Compare with actual score using mean absolute error (MAE)and look at distributions.
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
Data, ideas, assumptionsSketch of the modelMethodsModel validation
Model ValidationTraining and posterior predictive checks for validation
Train the model using first T − 1 years’ data. Evaluateperformance using T thyear.Compute posterior mean and posterior quantiles of score.Compare with actual score using mean absolute error (MAE)and look at distributions.
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
Data, ideas, assumptionsSketch of the modelMethodsModel validation
Model ValidationPosterior predictive checks: findings
95% posterior interval corresponds to about 3 tries; 90%interval to about 2.6 tries.78% scores in the 90% posterior interval for score.
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
Data, ideas, assumptionsSketch of the modelMethodsModel validation
Model ValidationPosterior predictive checks: findings
95% posterior interval corresponds to about 3 tries; 90%interval to about 2.6 tries.78% scores in the 90% posterior interval for score.
0
25
50
75
Actual Predicted
Sco
re
Figure: Test data posterior predictive distributionM. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
Data, ideas, assumptionsSketch of the modelMethodsModel validation
Model ValidationPosterior predictive checks: findings
10 20 30 40
WalesFrance
EnglandIreland
ItalyScotland
IrelandScotland
WalesItaly
FranceEngland
EnglandWales
20 40 60
ScotlandFrance
IrelandItaly
ItalyEngland
WalesScotland
FranceIreland
ScotlandEngland
FranceItaly
IrelandWales
Figure: 6 Nations 2019 results, test data
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
Data, ideas, assumptionsSketch of the modelMethodsModel validation
Model ValidationPosterior predictive checks: findings
MAE in test data: 6.1
Correct Incorrect
Home 37 6
Away 20 4
Table: Confusion matrix for predictions, test data
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
What does the model tell us?Predictions: group stagePredictions: knock-out
Interesting ResultsPosterior distribution of attack strength AR process
Figure: Attack strength for each teamM. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
What does the model tell us?Predictions: group stagePredictions: knock-out
Interesting ResultsPosterior distribution of defence strength AR process
Figure: Defence strength for each teamM. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
What does the model tell us?Predictions: group stagePredictions: knock-out
World Cup PredictionsGroup stage 1
0 50 100 150
RussiaJapan
ArgentinaFrance
FijiAustralia
South AfricaNew Zealand
TongaEngland
ScotlandIreland
NamibiaItaly
GeorgiaWales
SamoaRussia
UruguayFiji
Model’s odds Bookmakers’ odds
<1/1000, >9999/1, >9999/1 1/100, 48/1, 469/20
7/1000, 479/1, 210/1 73/100, 497/25, 123/100
6/1000, 592/1, 251/1 11/100, 1633/50, 23/4
263/100, 19/1, 5/10 43/100, 533/25, 201/100
<1/1000, >9999/1, >9999/1 1/100, 543/10, 3013/100
36/100, 4/1, 20/1 27/100, 2293/100, 31/10
<1/1000, >9999/1, >9999/1 1/100, 219/4, 1823/100
<1/1000, >9999/1, >9999/1 1/100, 2467/50, 81/5
1795/100, 52/1, 1/10 1629/100, 1201/25, 1/50
49/100, 26/1, 2/1 1/100, 461/10, 422/25
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
What does the model tell us?Predictions: group stagePredictions: knock-out
World Cup PredictionsGroup stage 2
0 50 100
CanadaItaly
USAEngland
TongaArgentina
IrelandJapan
NamibiaSouth Africa
WalesAustralia
UruguayGeorgia
SamoaScotland
CanadaNew Zealand
USAFrance
Model’s odds Bookmakers’ odds
<1/1000, >9999/1, >9999/1 2/25, 1999/50, 7/1
<1/1000, >9999/1, >9999/1 1/100, 4771/100, 2007/100
<1/1000, 4479/1, 2280/1 3/100, 2399/50, 318/25
6669/100, 216/1, <1/10 177/20, 1042/25, 3/50
<1/1000, >9999/1, >9999/1 -, 1911/25, 5613/100
11044/100, 262/1, <1/10 111/100, 1043/50, 21/25
50/100, 23/1, 2/1 1/5, 784/25, 37/10
3/1000, 855/1, 622/1 3/25, 887/25, 269/50
<1/1000, >9999/1, >9999/1 -, 105/1, 8343/100
<1/1000, >9999/1, >9999/1 3/100, 4353/100, 291/25
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
What does the model tell us?Predictions: group stagePredictions: knock-out
World Cup PredictionsGroup stage 3
0 25 50 75
FijiGeorgia
RussiaIreland
ItalySouth Africa
ArgentinaEngland
UruguayAustralia
SamoaJapan
TongaFrance
NamibiaNew Zealand
CanadaSouth Africa
USAArgentina
Model’s odds Bookmakers’ odds
108/100, 17/1, 1/1 251/100, 521/25, 17/50
<1/1000, >9999/1, >9999/1 -, 2247/25, 99/2
7/100, 71/1, 18/1 1/50, 4413/100, 309/20
<1/1000, >9999/1, >9999/1 1/10, 3331/100, 347/50
1/1000, 6021/1, 2510/1 -, 6841/100, 1939/50
3/100, 126/1, 56/1 9/100, 2009/50, 168/25
<1/1000, >9999/1, >9999/1 3/100, 861/20, 611/50
<1/1000, >9999/1, >9999/1 -, 181/2, 353/4
<1/1000, >9999/1, >9999/1 -, 8583/100, 159/2
1/100, 326/1, 110/1 2/25, 192/5, 141/20
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
What does the model tell us?Predictions: group stagePredictions: knock-out
World Cup PredictionsGroup stage 4
0 25 50 75
FijiWales
RussiaScotland
GeorgiaAustralia
FranceEngland
SamoaIreland
ItalyNew Zealand
CanadaNamibia
ScotlandJapan
TongaUSA
UruguayWales
Model’s odds Bookmakers’ odds
<1/1000, >9999/1, >9999/1 2/25, 906/25, 703/100
<1/1000, >9999/1, >9999/1 1/100, 273/5, 592/25
1/100, 302/1, 125/1 1/100, 2273/50, 801/50
4/100, 98/1, 34/1 -
1/1000, 2886/1, 2482/1 1/50, 159/5, 404/25
19/100, 37/1, 7/1 -
178/100, 18/1, 7/10 23/25, 16/1, 93/100
3084/100, 115/1, <1/10 73/50, 159/5, 404/25
16/100, 44/1, 7/1 161/100, 471/25, 11/20
<1/1000, >9999/1, >9999/1 -, 58/1, 145/4
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
What does the model tell us?Predictions: group stagePredictions: knock-out
World Cup PredictionsKnockout stage: quarter finals and semi finals
0 20 40 60
AustraliaEngland
IrelandNew Zealand
FranceWales
South AfricaJapan
New ZealandEngland
South AfricaWales
Model’s odds Bookmakers’ odds
<1/1000, 7416/1, 3313/1 31/100, 614/25, 139/50
278/100, 21/1, 4/10 1/5, 686/25, 197/50
30/100, 19/1, 5/1 7/20, 562/25, 259/100
22140/100, 495/1, <1/10 551/100, 3329/100, 13/100
2/100, 182/1, 57/1 239/100, 2307/100, 37/100
34/100, 20/1, 4/1 257/100, 2473/100, 33/100
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
What does the model tell us?Predictions: group stagePredictions: knock-out
World Cup PredictionsPredictions for the final and third place playoff
Figure: Predictions: Final and third place
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
What does the model tell us?Predictions: group stagePredictions: knock-out
Summary
Poisson model for score: a little underdispersed (as always).Hidden autoregressive processes filter past performance.Predictions surprisingly good considering not many predictors.But not that good (e.g. New Zealand).Future work: more predictor variables, try negative binomialmodel.
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
What does the model tell us?Predictions: group stagePredictions: knock-out
Summary
Poisson model for score: a little underdispersed (as always).Hidden autoregressive processes filter past performance.Predictions surprisingly good considering not many predictors.But not that good (e.g. New Zealand).Future work: more predictor variables, try negative binomialmodel.
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
What does the model tell us?Predictions: group stagePredictions: knock-out
Summary
Poisson model for score: a little underdispersed (as always).Hidden autoregressive processes filter past performance.Predictions surprisingly good considering not many predictors.But not that good (e.g. New Zealand).Future work: more predictor variables, try negative binomialmodel.
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
What does the model tell us?Predictions: group stagePredictions: knock-out
Summary
Poisson model for score: a little underdispersed (as always).Hidden autoregressive processes filter past performance.Predictions surprisingly good considering not many predictors.But not that good (e.g. New Zealand).Future work: more predictor variables, try negative binomialmodel.
M. Box Predicting the Rugby World Cup
IntroductionModellingOutcomes
What does the model tell us?Predictions: group stagePredictions: knock-out
Summary
Poisson model for score: a little underdispersed (as always).Hidden autoregressive processes filter past performance.Predictions surprisingly good considering not many predictors.But not that good (e.g. New Zealand).Future work: more predictor variables, try negative binomialmodel.
M. Box Predicting the Rugby World Cup