Scientific Gambling – Ice Hockey World Championships starting tomorrow

iihf-2018

The tournament is starting tomorrow with four games. From now on, future posts on this topic on the public Facebook group Scientific Gambling on Ice Hockey World Championships 2018 only.

So, I you want to continue following how my Bayesian Inference engine performs in its attempts to scientifically predict the outcomes of the games, and how much money I’m going to win – or lose – on my gambling, check out the FB-group.

ai2

Advertisements
Posted in Bayes, Big Data, Data Analytics, Data Driven Management, Gambling, HOCKEY-2018, Math, Numpy, Probability, PYMC, Python, Statistics | Tagged , , , , , , , , , , , , | Leave a comment

Scientific Gambling – “House Advantage”

In previous post we looked at how Betting Shops, Casinos etc make money, fundamentally by ‘salting’ the odds just a tiny bit in their favor.

Let’s use two very simple games to illustrate how this works, tossing coins and throwing dice.

Let’s start with a game of coin toss, assuming a fair coin, that is, the probability of getting heads or tails is fifty-fifty, that is, 0.5. The corresponding ‘straight’ odds for this game are thus 2 (decimal format), or 1:1 (fractional format). By ‘straight’, I mean odds that are directly given by the outcome probabilities. However, no professional betting shop can issue straight odds, if they would do so, they would very soon go out of business, simply because playing with straight odds is a zero sum game, that is, in the long run, the expected value of the game is zero, for both the player and for the house.

How to see that…? Let’s pretend we play coin toss repeatedly, very many times. In the long run, we should expect winning in half (50%) of the games, losing in the other 50% of the games. As an example, let’s say we play 100 times, the stake per game is 1$, and we have fair odds, that is 2 (inverse of 50% probability). So, these 100 games, where each game costs 1$, will cost us 100$. Statistically, we should win 50 times, lose 50 times. To make this a fair game, i.e. a zero sum game, we need to win back the cost of all the 100 games, that is 100$ within the 50 winning games. Dividing 100$ by 50 wins means that each win should return 2$, what is what will happen if we have straight odds set, i.e odds of 2 for this game.

So, in order to make money, the betting shops do not use straight odds, but instead ‘salt’ the odds a tiny bit in their favor, to ensure a healthy margin.  Let’s markup the probabilities by 5%, i.e. applying a markup of 1.05 to the odds. Now,  and instead of odds 2, we have odds 1.90 for both outcomes. Still, assuming a fair coin, we can expect winning 50% of the time. So, again, with 100 games, we should expect to win 50. The cost of the 100 games is still 100$, but the returns now are not 50 * 2 == 100, but 50 * 1.90 == 95$. That is, in the long run, we should expect to lose 5% of our stake. That markup of  of the odds, here 5% , thus generates a house advantage of 5%, which is the reason for being for any professional betting shop or casino.

For the game of throwing dice the same thing applies: in dice throwing, the probability of a fair dice to land on any of the 6 sides is 1/6, which gives that fair betting odds are 6 (decimal), or 5:1 (fractional). In order to make a profit, the betting shop must markup the odds, exactly as for the coin tossing example above.

Below two graphs shows simulations of 1.000.000 games of coin tossing and dice throwing, where the red lines show the cumulative results (total win/loss) after the 1.000.000 games. As can be seen, with fair odds both games are very close to zero sum, while with a 5% markup, the house makes a healthy profit of 5%, i.e. about 50.000 $.

coinsdice

Posted in Gambling, Math, Probability, Statistics | Tagged , , , | Leave a comment

Scientific Gambling – how do betting shops make money….?

Betting shops are commercial businesses, that is, they want to and must make money in order to survive. Like any other business. So take a casino as an example: they make money – in the long run – by having set the odds just a tiny bit in their favor, the typical “house advantage” in games like Roulette is 5.26% (American Roulette) and 2.70% (European Roulette). What the house advantage tells us, is the relative amount a player is expected to lose for each play.  In the long run. Thus, making a bet of 1$, you should expect to lose about 5 or 3 cents, each time. Over time, these tiny wins for the casino accumulate to quite a lot of money. I have no idea what turnover per day the typical Las Vegas casino has, but let’s say 10.000.000 USD. 5% of that is 500.000 USD, not bad for spinning a few wheels…

The cool thing about games like Roulette is that they are based on “known unknowns”, where the risk is fully understood mathematically, i.e. all the probabilities involved are fully known.

In sports betting, on the other hand, the probabilies are not known, since we are dealing not with risk, but uncertainty, i.e. we are dealing with Unknown Unknowns. So how do betting shops like Unibet, Svenska Spel and others make money on sports betting…?

Easy: just apply a markup to the probabilies/odds: Below an illustration, from a simulation run on my computer:

The blue line shows the cumulative returns given “fair odds”, i.e. odds that are a direct reflexion of the probability of the game outcome: for instance, if the probabilities for a given game are believed to be 1/3 each for WIN, DRAW, LOSS, then each of these outcomes would have odds set to 3 (decimal), or 1:2 (fractional) , that is, you’d get 3$ back for your 1$ stake, if you happened to win.  As can be seen from the graph, after a million games the blue line ends up a little bit above 0, meaning that the player, i.e. you, in this case would leave the casino – or betting shop – with a small gain.

The red line shows what happens to expected returns when I have applied a tiny markup to the probabilities/odds, as all betting shops or casinos do: the line grows almost monotonically towards the negative side, i.e. constantly accumulating wins for the casino/betting shop. That’s the “house advantage” at play.  In this simulation run, I’ve set the house advantage, or markup, to a rediculously low value, regardless, the result is clear, the house is making money.

Anyone wanting to take a guess on the house advantage set in this example…? 🙂

But remember that in sports betting, we are dealing with Unknown Unknowns. That means that to safeguard against potentially huge losses due to all the uncertainties involved in sports game outcomes, the betting shops need to apply a fairly hefty markup to their odds, otherwise the would run a clear risk of going bankrupt in the case when they have set way too high odds.

“Prediction is difficult, particularily about the future”.

odds-vs-mkup-odds

 

Posted in Bayes, Data Analytics, Gambling, HOCKEY-2018, Math, Numpy, Probability, PYMC, Pystan, Python, Simulation, Statistics | Tagged , , , , , , , , , , , | Leave a comment

Scientific gambling – How to identify potentially profitable odds/plays ?

In all sports gambling, success or failure is determined by a number of factors, luck not being the least of them, since in any sport there are loads of “Unknown Unknowns“, which we could also call “Uncertainty”. And then there is randomness.

If we compare sports betting with casino type betting, in casino betting we can talk about “Risk” instead of “Uncertainty”, since for the casino games like roulette etc we can mathematically calculate all the probabilities involved. However, in sports betting, no matter how much data we have, we have no way to mathematically decide the exact odds, because there are many Unknown Unknowns involved in each game. Thus, sports betting is a truly probabilistic endeavor, fully dominated by uncertainty, asop to calculable risk.

But, despite the presence of the Unknown Unknowns, we can attempt to do the best we can, by utilizing data that we have available, such as rankings, historical results etc. This is what my Bayesian Inference Engine does, basically calculating the most likely outcomes for any game, given the rankings and historical results (as well as the statistical model I’ve given it).

So, let’s take a concrete example of how I decide upon which games to bet on:

First, I execute my Bayesian Engine on the rankings and historical game data that I’ve gathered, about 1100 world championship games since year 2000 to date. Basically, what the engine does is to do millions on simulations, trying to come up with model parameters that generate the same results as in the historical games. Once these parameters are known, they are used to predict the outcomes of new games.  Such a run can take anything from a few hours to days (or even weeks) depending on how many games the model is predicting, how much data there is, and how “deep” I’ve told it to go in its analysis. A typical run covering each days 4-6 games takes about 10 hours of cpu-time on my machine.

When the Bayesian Inference in the above step is done – typically run overnight – it’s time to analyze the predictions, and to compare the predictions with the predictions of a commercial betting shop, in my case, Unibet, trying to identify games where my program has found an advantage in the odds setting.

The first thing I do, is to check for any games where the probabilities between Unibet’s predictions and my predictions differ:

probabilities

Instead of looking at de massive amount of numbers in rows and columns, the above graph gives me a quick overview on where, according to my program, there might be an odds setting that is advantageous to me: basically, here I look for the different colored sub-bars, and compare their size. For instance, here, I can see that the game RUS-FRA, my program has identified the probability of a draw quite a bit higher than Unibet, thus, that might be a candidate for placing a bet. Let’s have a closer look:

probability_diff

Above graph shows the differences in probabilities between Unibet’s predictions and my predictions. The one’s that are potentially interesting, are the one’s on the plus-side, i.e. where my program has found a higher probability than Unibet.  RUS-FRA is interesting, as well as SWE-BLR, and SUI-AUT. Let’s dig deeper, using RUS-FRA as our example:

EV

Here we can see that given my program’s prediction of the outcome of RUS-FRA, vs. the prediction on which Unibet bases its odds, there is an opportunity here to make money  – IFF MY PROGRAMS PREDICTION HAPPENS TO BE CORRECT AND THE SMALL MIRACLE OF FRA WINNING HAPPENS! – since according to my program, the probability for a draw as my program sees it, is higher than how Unibet sees it, and therefore, they have set the odds for a draw higher than they should be, as my prediction sees the outcome of that game.

Of course, RUS is still a huge favorite to win, for Unibet as well as for my program, with 8/10 vs 7/10 wins for Russia, respectively, but that small difference is exploitable, as a high risk/high reward gamble, since the odds are set for 8/10 wins, not 7/10 wins, as my program predicts.

Let’s zoom in a bit closer to see why it might be that my program puts a bit higher probability to a draw than does Unibet:

RUSFRA_hist

Russia – France have met 4 times at world championship level since 2000, and Russia have won all but one, the 2013 game. Now, I have no insights into how Unibet’s odd compiler ranks this type of “anomaly”, but that single loss could be the difference in predictions.

Anyways, it’s interesting enough for me to put some money on that game, yes, the likelihood of winning the bet is fairly small – after all, my program thinks that RUS will beat FRA in more than 7 out of 10 games, but the upside is quite large, thus worth the calculated risk. As stated above, sports betting has loads of Unknown Unknowns, and who knows, perhaps I’m getting lucky….? 🙂

Posted in Bayes, Data Analytics, Data Driven Management, Gambling, HOCKEY-2018, Math, Numpy, Probability, PYMC, Python, Statistics | Tagged , , , , , , , , , , , | Leave a comment

Scientific Gambling on Ice Hockey Worlds – identifying potentially exploitable games

One of the most difficult aspects of dealing with lots of data, is to present the information obtained from various computations in a clear and meaningful way. For instance, in order to identify games where there is a potentially exploitable gambling advantage between the odds given by Unibet, and the probabilities obtained from my Bayesian engine, I must compute all predited game outcomes (WIN/DRAW/LOSS) for both Unibet and my system, and then identify potential exploitable differences. The graph below is a new attempt to consolidate all this information into a single graph.
Each game has potentially 4 bars. The 3 leftmost are predictions from my system, where the third is the average prediction of the two statistical models I use. The rightmost bar (where it exists) is Unibet’s odds converted to probabilities (taking into account the markup that all betting shops place on their odds).
So, with this graph, the basic process to identify potentially exploitable games is to compare each of the colored sub-bars from my program, with the corresponding Unibet bar. Those bets where any of my sub-bars are taller than Unibet’s, are the high risk/high reward games that might be exploitable.
[there are two reasons not all games have all four bars: if there’s no previous historical games between the two teams, like for FIN-KOR, or if Unibet have not yet publicized their odds, as for FRA-BLR and DEN-USA]
probabilities
Posted in Bayes, Data Analytics, Data Driven Management, Gambling, HOCKEY-2018, Math, Numpy, Probability, PYMC, Python, Statistics | Tagged , , , , , , , , , , , , | Leave a comment

Scientific Betting on Ice Hockey Worlds now on Facebook

Scientific Gambling on Ice Hockey World Championships 2018

ai2

Posted in Bayes, development, Gambling, HOCKEY-2018 | Tagged , , | Leave a comment

Scientific Gambling on Hockey Worlds – Expected profits from games of day 1 & 2

An Expected Value-calculation gives the expected gains from my bets on the games played during the first two days of the tournament as follows:

          OUTCOME  U_ODDS       U_P         P   P_DELTA  EV_PER_UNIT
HOME AWAY
CZE  SVK     DRAW    5.20  0.192308  0.243738  0.051430     0.267438
     SVK     LOSS    5.80  0.172414  0.169599 -0.002814    -0.016324
GER  DEN     DRAW    4.25  0.235294  0.274173  0.038879     0.165237
     DEN     LOSS    3.20  0.312500  0.341512  0.029012     0.092838
NOR  LAT     DRAW    4.00  0.250000  0.291737  0.041737     0.166949
RUS  FRA     DRAW   10.00  0.100000  0.184241  0.084241     0.842407
SUI  AUT     DRAW    5.50  0.181818  0.320461  0.138643     0.762537
     AUT     LOSS    6.75  0.148148  0.147649 -0.000499    -0.003367
SWE  BLR     DRAW    8.00  0.125000  0.209213  0.084213     0.673707
     BLR     LOSS   11.00  0.090909  0.149934  0.059025     0.649274

In the table above, the ‘OUTCOME’ column shows my bet, from the perspective of the ‘home’-team, ‘U_ODDS’ are the odds given by Unibet, ‘U_P’ are those odds converted to a probability, ‘P’ is the probability that my Bayesian model gives to the game, ‘P_DELTA’ the difference in Unibet’s and my programs belief on the outcome probability, and ‘EV_PER_UNIT’ the expected value per unit stake.

For these initial games, the calculated profit margin for my bets is 22%

Below the Expected Value in graphical form.

(The reason FIN-KOR has no bars in the graph is that those two teams have never met before at championship level, and thus not enough data for meaningful inferences)

EV

Posted in Bayes, Data Analytics, Data Driven Management, Gambling, HOCKEY-2018, Numpy, Probability, PYMC, Python, Statistics | Tagged , , , , , , , , , , , | Leave a comment

Scientific Gambling on Ice Hockey Worlds – Bets for games of May 5th

ai

Summary

I’m using mathematical & statistical methods, more specifically, Bayesian Inference, Markov Chain Monte Carlo, simulation and Probabilistic Programming, attempting to predict the game outcomes of the upcoming Ice Hockey World Championships, starting May 4th.

Based on the findings of my computations, I’m betting, and thus putting some Skin-In-The-Game,  on selected high odds games.

Strategy

To summarize my betting strategy: take a lot of calculated risk by only betting on high odds games, thereby expecting to loose most of the bets, but the few expected wins at high odds will hopefully compensate for the many losses. Hedge where Bayesian Inference indicate so. 

 

Bets placed for May 5th’s games

Games, Bets & Unibet Odds

  • NOR – LAT, @ DRAW, odds : 4.00
  • SUI – AUT, @ DRAW, odds : 5.50
  • SUI – AUT, @ AUT WIN, odds : 6.75
  • CZE – SVK, @ DRAW, odds : 5.20
  • CZE – SVK, @ SVK WIN, odds : 5.80

Previously placed bets

May 4th’s games

  • RUS – FRA, @ DRAW, odds : 10.00
  • SWE – BLR, @ DRAW, odds : 8.00
  • SWE – BLR, @ BLR WIN, odds : 11.00
  • GER – DEN, @ DEN WIN, odds : 3.20
  • GER – DEN, @ DRAW, odds : 4.25

 

Posted in AI, Bayes, Big Data, Business, Data Analytics, Data Driven Management, Gambling, HOCKEY-2018, Math, Numpy, Probability, PYMC, Python, Statistics | Tagged , , , , , , , , , , , , , | Leave a comment

Bayesian Prediction – wanna bet…? Putting your money where your mouth is…

[A disclaimer: I know virtually nothing about contemporary ice hockey, my interest faded when Börje Salming decided to put his skates on the shelf for a couple of decades ago, so I have not included any personal hockey insights into the predictions, e.g. current form etc of the teams, my predictions are purely based on:

  • 1) Stats & Mathematics
  • 2) current IIHF Ranking table
  • 3) about 1100 historical championship game results from 2000 and forward
  • 4) a few thousand lines of Python and PYMC code ]

So, after almost four weeks of intense development, my Bayesian Inference Engine for the soon-to-start IIHF 2018 World Championships is about ready for action.  I believe.

Thus, I will use it to predict the outcome of each game during the upcoming championships using its Bayesian Inference Model, and publish the predictions – obviously ahead of the game..! – here.  All this in the hopes of making some serious money by betting according to the predictions from my statistical model….

That objective leads to a well defined betting strategy: I’m going focus solely on high-odds games, thus taking a large amount of hopefully well calculated risk, expecting thereby to make huge gains on those bets that actually go my way.  This strategy means I’m not going to play any of the low odds games at all, no matter how “safe bets” they do appear.

To summarize the strategy: take a lot of calculated risk by only betting on high odds games, thereby expecting to loose most of the bets, but the few expected wins at high odds will hopefully compensate for the many losses. 

So, for those games where my program predicts a result that indicates an advantage over the odds of a specific professional betting shop, Unibet, I will make bets according to the predictions of my program, and publish the outcomes of those bets here.  That’s simpy “Skin-In-The-Game“, or “Putting Your Money Where Your Mouth is”, that is, standing up for one’s predictions/beliefs with real skin in the game, asop to being just a normal pundit with nothing to loose by erroneous predictions…. 🙂

Anyways, the championships start May 4th, with 4 games [Odds @ Unibet W/D/L]

  • Russia – France  ODDS:[1.12/10/15]
  • USA – Canada  ODDS:[5.30/5.10/1.47]
  • Sweden – Belarus  ODDS:[1.18/8/11]
  • Germany – Denmark ODDS:[1.94/4.25/3.20]

So, let’s compare the odds above given by Unibet, with the predictions from my statistical inference. In fact, my program computes two predictions, one based on the historical spreads, the other based on scores of each individual historical game between the teams, thus two graphs below. (Btw, don’t bother about the x-axis values, they do not correspond to anything real, at least not directly – they are scaled in various ways to make the prediction (hopefully) better…)

Anyways: looking at the graphs below, one of the outputs from my program, both RUS-FRA and SWE-BLR seem very much ín line with the Unibet predictions, thus no point in playing those low odds.  Same goes for USA-CAN, where my program gives even higher odds than Unibet for USA winning or a draw.

But there’s one (1) game that looks more interesting from “trying to make money on betting”-perspective: GER-DEN: the odds given by Unibet on that game to be a draw are 4.25, while the average of my two models predicts 3.4 for a draw.  Furthermore, DEN winning will give 3.20 according to Unibet, while my model predicts about 2.7.

Decision time: I’ll bet 1 unit on draw, and 1 unit on DEN winning !

(how much I’m actually betting in real money will remain my secret – after all, I don’t want the tax authorities after me…! 😉 )

So, my bet will cost me 2 “units”,  and according to my model, I have about 30% chance of winning the draw-bet, which in that case would give me 4.25 “units” back, i.e a win of 2.25 times my stake. And I have about 35% chance of winning my bet on DEN winning the game, giving me 3.20 units back, i.e. a win of 1.20 times my money.

Of course, I also have about 35% chance (or risk) loosing both bets, if GER wins, but no guts, no glory…

Another way to put it is in terms of EV, expected value, i.e. the expected returns on each “unit” of stake, of course in the long run… :

  • GER – DEN DRAW: EV = 0.29
  • GER – DEN DEN WIN: EV = 0.12

While e.g. USA – CAN has an EV of -0.42, which again confirms my decision not to play on that game.

I’d like to end this post with Disclaimer II:

“Prediction is difficult, particularly about the future”

Results of my betting will be published as soon as the games are finished on may 4th.

[EDIT April 25th: After having written a utility for Expected Value-analysis last night, I decided to add a few bets on the first day games, so the full betting list now looks like:

  • RUS-FRA : bet on draw at odds 10
  • SWE-BLR : bet on draw at odds 8
  • SWE-BLR : bet BLR winning at odds 11
  • GER-DEN : bet DEN winning at odds 3.20
  • GER-DEN : bet on draw at odds 4.25

END EDIT]

 

 

CI_plot185636eadsCI_plot185627oals

 

 

Posted in Bayes, Business, Data Analytics, Data Driven Management, Gambling, HOCKEY-2018, Math, Numpy, Probability, PYMC, Python, Statistics | Tagged , , , , , , , , , , , | Leave a comment

Bayesian updating with PYMC

I’ve been looking for neat ways to update a Bayesian Prior from a posterior sample for a while, and just the other day managed to find what I was looking for: a code example that shows how to make a posterior to a distribution, known to PYMC. The code for doing that, by jcrudy, can be found here. The key insight with that code is the “known to PYMC”-part: it’s no problem to make a posterior sample to a distribution, but in order to make it compatible with PYMC, it needs to be done as a PYMC object.  And that’s what jcrudy’s code does.

Anyway, below a small example, borrowed from Richard McElreath’s excellent book “Statistical Rethinking”.

Let’s pretend that we wanted to know the ratio of Water to Land of our earth, and that the way we will go about it is to toss an inflatable globe into air a number of times, and keep track of where we first touch it when we catch it, land or water.

Let’s say that after the first 9 tosses, our results look like below:

[1,0,1,0,1,0,1,1,1] where ‘1’ indicates water, and ‘0’ land.

The basic logic of our Bayesian Inference is to update the prior,  each time we get a new data point, i.e. either a ‘1’ or a ‘0’.

Let’s furthermore say that we have an initial idea of the correct ratio being somewhere around 50%, but we are far from certain, so we will set a fairly “loose” initial prior, a Beta-distribution centered on 50%, as below. This prior allows the calculated rate to be almost the entire range of 0% to 100%, with the highest probability around 50%.

initial_prior

Now, what we want to do is to update our prior belief – the one initially being centered around 50% – based on each subsequent data point.

Below the results of 18 of such updates (the above vector of 9 points is concatenated to provide these 18 data points):

updated_priors

This graph is read from left-to-right, and shows the updated prior, or current belief on the correct ration, after each subsequent data point has been processed.

From the topmost left image, we can see that when the first data point, which happens to be a ‘1’, has been processed, the Bayesian model has adjusted its belief on the correct ratio somewhat to the right, from 50% to 55%, and after having processed the ‘0’ in the second data point, reduced the current belief back to 50%. And so on so forth as the remaining data points get processed.

Finally, after having processed all 18 data points, updating the prior for each point, the Bayesian model settles for a ratio of 64% (mean value). But the real benefit of Bayesian analysis is that we actually get a probability distribution as a result, not just a point estimate. Thus, the Bayesian approach preserves the underlying uncertainty of the model in a very neat, visually obvious way.

If you look closely at the 18 different plots above, you will see that the distribution gets narrower as each data point is processed. This means the Bayesian model is gradually getting more and more certain about the sought after ratio, as it gets more data to base its estimate on.

Code below.

Continue reading

Posted in Bayes, Data Analytics, Math, Numpy, Probability, PYMC, Python, Statistics | Tagged , , , , , , , , | Leave a comment