Just for fun, I thought I’d implement a Bayesian “statistical inference engine” for some sports tournament. For whatever reason, I came to choose ice hockey world championships, training my inference engine on data from the 2016 tournament, and testing its predictions against the results of the 2017 tournament.

A problem with this approach is that most of the time, the set of teams participating is not the same year over year, and since this is just a small experiment, I did not want to go thru year after year of match data for finding at least one game result for each combo of the teams that were playing the 2017 championships. Secondly, for the same reason, to avoid lots and lots of boring data entry, I settled with having just one game result per team pair in my training data, thus, from the 8 teams in Group A 2016 that I chose as my results data, where each team met every other team once, I got 28 results in total, which clearly is not much to make predictions upon. But, the purpose of this experiment was not to get a high accuracy in predictions, but to experiment a bit with a slightly more complex inference scenario.

Thus, with the small amount of data, and without having put much thought about my various priors, I wasn’t expecting much accuracy in the predictions.

A bit about the model (this time, I’m not going to publish the code, for reasons that will become clear in a minute… 😉

The data the model is fitting, consists of 28 game results from the 2016 world championships. From those matches, I collect the point spread for each game, not the game result.

Why not the results? Because the point spread carries more information than the binary win/loose result. If I just know that team A beat team B, that’s one bit of information, while a result like 9-4, with a delta of 5, conveys much more information.

Furthermore, I use the International Ice Hockey Federation’s ranking of teams for 2016 as one of the priors.

Here’s the prediction result, coming from a few hundred lines Python and PYMC code:

On the vertical axis are the 28 games from 2016. The vertical red dashed line is “point zero”, i.e. if the blue dot is to the right of that line, the “home-team” – the team on the left, is predicted to win, if the blue dot is on the left of the red line, the home team is predicted to loose. Each blue dot has a 95% confidence interval, illustrated by the blue lines.

The red dots represent the training data, that is, the game outcomes from the 2016 tournament. They are there just for easy control of the predicted, blue results, and would normally, in a “real” prediction scenario, not be there.

So, how did the model do….? In fact, much better than I expected, given the very limited amount of data, the minimal thought I’ve given to the priors, and last but not least: as anyone interested in betting knows, it is notoriously hard to predict sports events – even the world’s best team can occasionally loose against a team with a ranking around 50…

To check the prediction power of the model, I compared its predictions with the results from the 2017 world championships. As noted above, not all teams from 2016 were present, nor did all those that were present 2016 play each other 2017, so out of my 28 training games from 2016 there were only 11 matching games in 2017.

Never the less, the model correctly predicted 7 of the 11 possible outcomes, that is, the model got it right in 64% of the cases. And as any gambler knows, that kind of “house advantage” is enourmous…!

So, the reason I’m not publishing any code this time is that I might consider putting some real thought into enhancing the model, e.g. to predict the upcoming 2018 hockey championships, or perhaps even more interesting: the 2018 FIFA world cup… 🙂