ÅF Offshore Race – S/Y Singdoy WP ETA Prediction

     LAT   LON  COG  DIST  LEGTIME  CUMDIST  TOT_TIME              TIMEDELTA
0  59.32 18.09    0   nan      nan      nan       nan                    NaT
1  59.32 18.11  103  0.84     0.12     0.84      0.12 0 days 00:07:14.260800
2  59.32 18.16   83  1.45     0.21     2.30      0.33 0 days 00:19:41.160000
3  59.34 18.21   57  1.72     0.25     4.02      0.57 0 days 00:34:28.027199
4  59.35 18.24   48  1.36     0.19     5.39      0.77 0 days 00:46:09.969600
5  59.36 18.27   54  1.00     0.14     6.38      0.91 0 days 00:54:43.318800
6  59.37 18.32   70  1.60     0.23     7.99      1.14 0 days 01:08:27.783600
7  59.37 18.35  100  0.92     0.13     8.91      1.27 0 days 01:16:20.308800
8  59.37 18.37   78  0.80     0.11     9.70      1.39 0 days 01:23:09.283200
9  59.37 18.39  100  0.44     0.06    10.14      1.45 0 days 01:26:56.882400
10 59.36 18.41  109  0.75     0.11    10.90      1.56 0 days 01:33:24.627600
11 59.36 18.44  103  0.94     0.13    11.84      1.69 0 days 01:41:29.976000
12 59.37 18.45   35  0.60     0.09    12.44      1.78 0 days 01:46:37.246800
13 59.40 18.45  355  1.69     0.24    14.13      2.02 0 days 02:01:06.402000
14 59.43 18.38  320  3.05     0.44    17.18      2.45 0 days 02:27:17.182800
15 59.43 18.39   85  0.40     0.06    17.58      2.51 0 days 02:30:41.983199
16 59.43 18.41  129  0.57     0.08    18.15      2.59 0 days 02:35:32.841600
17 59.43 18.42  117  0.50     0.07    18.64      2.66 0 days 02:39:48.715200
18 59.42 18.47   96  1.47     0.21    20.12      2.87 0 days 02:52:26.234400
19 59.42 18.48   90  0.34     0.05    20.46      2.92 0 days 02:55:22.040400
20 59.41 18.52  122  1.42     0.20    21.88      3.13 0 days 03:07:32.091600
21 59.39 18.55  140  1.41     0.20    23.29      3.33 0 days 03:19:36.879600
22 59.39 18.57  121  0.54     0.08    23.83      3.40 0 days 03:24:13.053600
23 59.38 18.59  120  0.80     0.11    24.63      3.52 0 days 03:31:05.987999
24 59.37 18.61  125  0.80     0.11    25.43      3.63 0 days 03:37:59.746800
25 59.38 18.65   79  1.34     0.19    26.78      3.83 0 days 03:49:30.054000
26 59.38 18.67   86  0.51     0.07    27.28      3.90 0 days 03:53:52.069200
27 59.38 18.69   83  0.49     0.07    27.77      3.97 0 days 03:58:03.414000
28 59.37 18.71  103  0.74     0.11    28.52      4.07 0 days 04:04:26.529600
29 59.37 18.74  102  0.80     0.11    29.32      4.19 0 days 04:11:16.861200
30 59.37 18.75  111  0.58     0.08    29.89      4.27 0 days 04:16:13.166400
31 59.36 18.77  140  0.92     0.13    30.81      4.40 0 days 04:24:06.192000
32 59.31 18.82  153  2.99     0.43    33.80      4.83 0 days 04:49:44.367600
33 59.31 18.83  109  0.34     0.05    34.14      4.88 0 days 04:52:39.316800
34 59.30 18.85  114  0.92     0.13    35.07      5.01 0 days 05:00:34.138800
35 59.30 18.87  105  0.60     0.09    35.67      5.10 0 days 05:05:44.635200
36 59.30 18.89  105  0.54     0.08    36.21      5.17 0 days 05:10:23.052000
37 59.30 18.90   93  0.40     0.06    36.61      5.23 0 days 05:13:50.512800
38 59.30 18.92   97  0.50     0.07    37.11      5.30 0 days 05:18:05.382000
39 59.30 18.94   96  0.57     0.08    37.68      5.38 0 days 05:23:00.405600
40 59.30 18.96   87  0.63     0.09    38.32      5.47 0 days 05:28:26.130000
41 59.30 18.96  118  0.17     0.02    38.48      5.50 0 days 05:29:51.752400
42 59.29 18.97  133  0.48     0.07    38.96      5.57 0 days 05:33:57.142799
43 59.29 18.98  119  0.26     0.04    39.22      5.60 0 days 05:36:09.572399
44 59.26 19.01  146  1.76     0.25    40.98      5.85 0 days 05:51:17.042400
45 59.15 19.15  146  7.98     1.14    48.96      6.99 0 days 06:59:41.049600
46 58.10 19.42  172 63.54     9.08   112.50     16.07 0 days 16:04:16.392000
47 57.38 19.09  193 44.85     6.41   157.35     22.48 0 days 22:28:41.548800
48 56.88 18.25  222 40.73     5.82   198.08     28.30 1 days 04:17:50.445600
49 56.89 18.12  279  4.28     0.61   202.37     28.91 1 days 04:54:33.847200
50 56.93 18.12    0  2.26     0.32   204.63     29.23 1 days 05:13:57.424800
51 57.06 18.17   12  8.05     1.15   212.68     30.38 1 days 06:22:57.842400
52 57.07 18.17  350  0.94     0.13   213.62     30.52 1 days 06:31:01.639200
53 57.12 18.17    2  2.88     0.41   216.50     30.93 1 days 06:55:44.068800
54 57.21 18.09  335  6.13     0.88   222.63     31.80 1 days 07:48:15.634800
55 57.27 18.07  349  3.32     0.47   225.95     32.28 1 days 08:16:41.048400
56 57.37 18.12   15  6.05     0.86   232.00     33.14 1 days 09:08:32.002799
57 57.47 18.10  353  6.22     0.89   238.22     34.03 1 days 10:01:51.873600
58 57.55 18.09  353  4.87     0.70   243.08     34.73 1 days 10:43:34.028400
59 57.61 18.21   45  5.61     0.80   248.69     35.53 1 days 11:31:39.914399
60 57.64 18.29   55  3.00     0.43   251.70     35.96 1 days 11:57:25.038000
61 59.16 19.14   16 94.69    13.53   346.39     49.48 2 days 01:29:03.181200
62 59.28 18.94  320  9.87     1.41   356.26     50.89 2 days 02:53:37.388400

IMG_1623IMG_1624IMG_1625

Advertisements
Posted in development | Leave a comment

An area preserving map projection

area preserving map projection

Posted in Maritime Technology, Nautical Information Systems | Tagged | Leave a comment

Capturing NMEA sentences over WiFi using Python

In order to figure out how the NMEA-WiFi Gateway deals with clients, e.g. if it expects any “handshake” or any other communication setup protocol, I decided to write a simulator mimicing the gateway, and then using iRegatta 2 from Zifago to verify that it can read the simulated NMEA messages sent by my “soft” gateway.

So, in the video above, my laptop (on the right) is pretending to be the NMEA-WiFi gateway, constantly broadcasting UDP packages containing NMEA sentences onto the network, and on the left my iPad running iRegatta 2 is collecting them and displaying the information obtained from the sentences.

With the communication between the Gateway and its clients now figured out, I’m able to collect full race data, including multiday races, from all the instruments onboard onto my laptop for after race “post mortem” race performance analysis.

Posted in Data Analytics, Maritime Technology, Nautical Information Systems, NMEA, Numpy, performance, Python, Simulation, TCPIP | Tagged , , , , , , , , , , , , | Leave a comment

Parsing NMEA 0183 sentences in Python

My skipper has recently bought an NMEA wifi gateway, which means that the NMEA messages from the various onboard instruments on his yacht are broadcasted on the yacht’s wifi network. This makes it very easy to grab the NMEA messages, and start trying to make sense of them.

Below an example of parsing a handful of the message types, on a Garmin network (note: the data collection was done with the boat stationary, thus there’s no interesting info about speed, vmg etc in these graphs, that will have to wait until we are actually sailing… 😉

The first graph is a frequency plot over the various NMEA message types – the most frequent messages are (remember, the boat was stationary):

IIHDT – heading T
GPWCV – waypoint closure velocity
GPZDA – time & date
IIVWR – AWA AWS
IIVTG – track made good and ground speed
GPXTE – xross track error
IIVPW – speed parallel to wind
IIDBT – depth below transducer
GPGSV – satellites in view
IIHDM – magnetic heading
IIWHW – Speed thru water
GPGLL – lat & long
IIMWD – wind direction & speed
GPRMC – recommended minimum data

nmea_prefixes

The next plot shows the true wind speed and angle over a period of time.

nmea_TW_timeline

The last graph shows two polar plots over true and apparent wind.

nmea_wind

Finally, an idea for the main performance analysis screen:

nmea_timeline

Continue reading

Posted in Data Analytics, Maritime Technology, Nautical Information Systems, NMEA, Python | Tagged , , , , , | Leave a comment

New Theory Cracks Open the Black Box of Deep Learning | Quanta Magazine

A new idea is helping to explain the puzzling success of today’s artificial-intelligence algorithms — and might also explain how human brains learn.
— Läs på www.quantamagazine.org/new-theory-cracks-open-the-black-box-of-deep-learning-20170921/

Posted in AI, development, Machine Learning | Tagged , , | Leave a comment

Making a living as a Professional Scientific Gambler using Bayesian Inference…?

prediction_results_unibetprediction_results_M2prediction_results_M1profit-lossreal_profitsAs my readers know, over the past few weeks I’ve been conducting an experiment:

Applying scientific betting on the just finished Ice Hockey World Championships.  By “scientific”, I’m referring to the exclusive use of statistical and mathematical models, simulation, and probabilistic programming, more specifically, Markov Chain Monte Carlo and Bayesian Inference.

For this experiment, wrote a Python based Bayesian Inference Engine, which fundamentally uses two main sources of information: historical game results and the official IIHF rankings table. Based on those two sources of information, the engine will come up with probalities for the outcomes of each of the 64 games in the tournament.

It is also important to state that I know absolutely nothing about contemporary ice hockey, I don’t know the teams nor the players,I have  not been watching ice hockey for at least 15 years, my interest in the game is zip & nada, so my own knowledge about ice hockey is pretty much zero. That means that I have not used any other information in my experiment, than the historical game outcome data and the official rankings. The predictions, and the betting resulting from those predictions are thus solely based on what the inference engine predicts in terms of outcomes.

So, with these premises, how did the experiment go ? Is it possible, even without any own interest or knowledge about a sport, to make a living as a professional gambler…?

The tournament consisted of 16 teams and 64 games. My inference engine predicted the outcomes of each of these games, and I placed bets, one or more, on most of the games.

Withouth revealing the actual sums involved, here’s the results, both financial and for the predictions:

  • Return on Investment: 21%
  • Net Profit Margin: 5%
  • Game Outcome Prediction Success Rate: 53% [to be compared to Unibet’s 58%]
  • Ratio successful/failed bets: 27%
  • Average betting odds: 6.5

So, over the past couple of weeks, I made 21% profit on top of my investment. From a financial perspective, that’s an ok ROI, and hadn’t this been an experiment, but a real attempt to make money, the profit would very likely been (much) higher: now, because of the experimental status, I’ve placed many “stupid” bets in order to see what happens,  bets which I wouldn’t have placed in a production scenario, where the sole objective would be to make money, not making experiments.

What about the performance of the Python based home made inference engine, then…? Actually, it performed much better than I hoped for: until the last few somewhat surprising games in the final rounds, the hit rate was around 63%, with Unibet at the same time being at 67%. Due to a couple of surprising outcomes, e.g. FIN-SUI and CAN-USA, my hit rate dropped to 53%, compared to Unibet’s final 58%.  So, my home made  program performed about 5 pct points worse than the prediction engine of a professional, huge betting shop, with zillions of computing power as well as zillions of experts, studying and knowing every aspect of each team and game.  Better yet, the engine performed well enough to allow me exit the experiment with a 21% ROI, i.e “winning over the betting house”, which is the real measure of success/failure.

So, in summary, to answer the question “Is it possible to make a living as a professional gambler using scientific methods ?”

My answer, after this experiment is YES. And the more you are interested in the game, the more you know about the game, the better “odds” you’ll have for being successful. But this experiment has demonstrated that by appying scientific methods, you can be fairly successful even if you know absolutely nothing about the game, nor have any interest in it.

 

Posted in Bayes, Data Analytics, Data Driven Management, Finance, Gambling, HOCKEY-2018, Math, Numpy, Probability, PYMC, Python, Simulation, Statistics | Tagged , , , , , , , , , | Leave a comment

Scientific Gambling – Ice Hockey World Championships starting tomorrow

iihf-2018

The tournament is starting tomorrow with four games. From now on, future posts on this topic on the public Facebook group Scientific Gambling on Ice Hockey World Championships 2018 only.

So, I you want to continue following how my Bayesian Inference engine performs in its attempts to scientifically predict the outcomes of the games, and how much money I’m going to win – or lose – on my gambling, check out the FB-group.

ai2

Posted in Bayes, Big Data, Data Analytics, Data Driven Management, Gambling, HOCKEY-2018, Math, Numpy, Probability, PYMC, Python, Statistics | Tagged , , , , , , , , , , , , | Leave a comment

Scientific Gambling – “House Advantage”

In previous post we looked at how Betting Shops, Casinos etc make money, fundamentally by ‘salting’ the odds just a tiny bit in their favor.

Let’s use two very simple games to illustrate how this works, tossing coins and throwing dice.

Let’s start with a game of coin toss, assuming a fair coin, that is, the probability of getting heads or tails is fifty-fifty, that is, 0.5. The corresponding ‘straight’ odds for this game are thus 2 (decimal format), or 1:1 (fractional format). By ‘straight’, I mean odds that are directly given by the outcome probabilities. However, no professional betting shop can issue straight odds, if they would do so, they would very soon go out of business, simply because playing with straight odds is a zero sum game, that is, in the long run, the expected value of the game is zero, for both the player and for the house.

How to see that…? Let’s pretend we play coin toss repeatedly, very many times. In the long run, we should expect winning in half (50%) of the games, losing in the other 50% of the games. As an example, let’s say we play 100 times, the stake per game is 1$, and we have fair odds, that is 2 (inverse of 50% probability). So, these 100 games, where each game costs 1$, will cost us 100$. Statistically, we should win 50 times, lose 50 times. To make this a fair game, i.e. a zero sum game, we need to win back the cost of all the 100 games, that is 100$ within the 50 winning games. Dividing 100$ by 50 wins means that each win should return 2$, what is what will happen if we have straight odds set, i.e odds of 2 for this game.

So, in order to make money, the betting shops do not use straight odds, but instead ‘salt’ the odds a tiny bit in their favor, to ensure a healthy margin.  Let’s markup the probabilities by 5%, i.e. applying a markup of 1.05 to the odds. Now,  and instead of odds 2, we have odds 1.90 for both outcomes. Still, assuming a fair coin, we can expect winning 50% of the time. So, again, with 100 games, we should expect to win 50. The cost of the 100 games is still 100$, but the returns now are not 50 * 2 == 100, but 50 * 1.90 == 95$. That is, in the long run, we should expect to lose 5% of our stake. That markup of  of the odds, here 5% , thus generates a house advantage of 5%, which is the reason for being for any professional betting shop or casino.

For the game of throwing dice the same thing applies: in dice throwing, the probability of a fair dice to land on any of the 6 sides is 1/6, which gives that fair betting odds are 6 (decimal), or 5:1 (fractional). In order to make a profit, the betting shop must markup the odds, exactly as for the coin tossing example above.

Below two graphs shows simulations of 1.000.000 games of coin tossing and dice throwing, where the red lines show the cumulative results (total win/loss) after the 1.000.000 games. As can be seen, with fair odds both games are very close to zero sum, while with a 5% markup, the house makes a healthy profit of 5%, i.e. about 50.000 $.

coinsdice

Posted in Gambling, Math, Probability, Statistics | Tagged , , , | Leave a comment

Scientific Gambling – how do betting shops make money….?

Betting shops are commercial businesses, that is, they want to and must make money in order to survive. Like any other business. So take a casino as an example: they make money – in the long run – by having set the odds just a tiny bit in their favor, the typical “house advantage” in games like Roulette is 5.26% (American Roulette) and 2.70% (European Roulette). What the house advantage tells us, is the relative amount a player is expected to lose for each play.  In the long run. Thus, making a bet of 1$, you should expect to lose about 5 or 3 cents, each time. Over time, these tiny wins for the casino accumulate to quite a lot of money. I have no idea what turnover per day the typical Las Vegas casino has, but let’s say 10.000.000 USD. 5% of that is 500.000 USD, not bad for spinning a few wheels…

The cool thing about games like Roulette is that they are based on “known unknowns”, where the risk is fully understood mathematically, i.e. all the probabilities involved are fully known.

In sports betting, on the other hand, the probabilies are not known, since we are dealing not with risk, but uncertainty, i.e. we are dealing with Unknown Unknowns. So how do betting shops like Unibet, Svenska Spel and others make money on sports betting…?

Easy: just apply a markup to the probabilies/odds: Below an illustration, from a simulation run on my computer:

The blue line shows the cumulative returns given “fair odds”, i.e. odds that are a direct reflexion of the probability of the game outcome: for instance, if the probabilities for a given game are believed to be 1/3 each for WIN, DRAW, LOSS, then each of these outcomes would have odds set to 3 (decimal), or 1:2 (fractional) , that is, you’d get 3$ back for your 1$ stake, if you happened to win.  As can be seen from the graph, after a million games the blue line ends up a little bit above 0, meaning that the player, i.e. you, in this case would leave the casino – or betting shop – with a small gain.

The red line shows what happens to expected returns when I have applied a tiny markup to the probabilities/odds, as all betting shops or casinos do: the line grows almost monotonically towards the negative side, i.e. constantly accumulating wins for the casino/betting shop. That’s the “house advantage” at play.  In this simulation run, I’ve set the house advantage, or markup, to a rediculously low value, regardless, the result is clear, the house is making money.

Anyone wanting to take a guess on the house advantage set in this example…? 🙂

But remember that in sports betting, we are dealing with Unknown Unknowns. That means that to safeguard against potentially huge losses due to all the uncertainties involved in sports game outcomes, the betting shops need to apply a fairly hefty markup to their odds, otherwise the would run a clear risk of going bankrupt in the case when they have set way too high odds.

“Prediction is difficult, particularily about the future”.

odds-vs-mkup-odds

 

Posted in Bayes, Data Analytics, Gambling, HOCKEY-2018, Math, Numpy, Probability, PYMC, Pystan, Python, Simulation, Statistics | Tagged , , , , , , , , , , , | Leave a comment

Scientific gambling – How to identify potentially profitable odds/plays ?

In all sports gambling, success or failure is determined by a number of factors, luck not being the least of them, since in any sport there are loads of “Unknown Unknowns“, which we could also call “Uncertainty”. And then there is randomness.

If we compare sports betting with casino type betting, in casino betting we can talk about “Risk” instead of “Uncertainty”, since for the casino games like roulette etc we can mathematically calculate all the probabilities involved. However, in sports betting, no matter how much data we have, we have no way to mathematically decide the exact odds, because there are many Unknown Unknowns involved in each game. Thus, sports betting is a truly probabilistic endeavor, fully dominated by uncertainty, asop to calculable risk.

But, despite the presence of the Unknown Unknowns, we can attempt to do the best we can, by utilizing data that we have available, such as rankings, historical results etc. This is what my Bayesian Inference Engine does, basically calculating the most likely outcomes for any game, given the rankings and historical results (as well as the statistical model I’ve given it).

So, let’s take a concrete example of how I decide upon which games to bet on:

First, I execute my Bayesian Engine on the rankings and historical game data that I’ve gathered, about 1100 world championship games since year 2000 to date. Basically, what the engine does is to do millions on simulations, trying to come up with model parameters that generate the same results as in the historical games. Once these parameters are known, they are used to predict the outcomes of new games.  Such a run can take anything from a few hours to days (or even weeks) depending on how many games the model is predicting, how much data there is, and how “deep” I’ve told it to go in its analysis. A typical run covering each days 4-6 games takes about 10 hours of cpu-time on my machine.

When the Bayesian Inference in the above step is done – typically run overnight – it’s time to analyze the predictions, and to compare the predictions with the predictions of a commercial betting shop, in my case, Unibet, trying to identify games where my program has found an advantage in the odds setting.

The first thing I do, is to check for any games where the probabilities between Unibet’s predictions and my predictions differ:

probabilities

Instead of looking at de massive amount of numbers in rows and columns, the above graph gives me a quick overview on where, according to my program, there might be an odds setting that is advantageous to me: basically, here I look for the different colored sub-bars, and compare their size. For instance, here, I can see that the game RUS-FRA, my program has identified the probability of a draw quite a bit higher than Unibet, thus, that might be a candidate for placing a bet. Let’s have a closer look:

probability_diff

Above graph shows the differences in probabilities between Unibet’s predictions and my predictions. The one’s that are potentially interesting, are the one’s on the plus-side, i.e. where my program has found a higher probability than Unibet.  RUS-FRA is interesting, as well as SWE-BLR, and SUI-AUT. Let’s dig deeper, using RUS-FRA as our example:

EV

Here we can see that given my program’s prediction of the outcome of RUS-FRA, vs. the prediction on which Unibet bases its odds, there is an opportunity here to make money  – IFF MY PROGRAMS PREDICTION HAPPENS TO BE CORRECT AND THE SMALL MIRACLE OF FRA WINNING HAPPENS! – since according to my program, the probability for a draw as my program sees it, is higher than how Unibet sees it, and therefore, they have set the odds for a draw higher than they should be, as my prediction sees the outcome of that game.

Of course, RUS is still a huge favorite to win, for Unibet as well as for my program, with 8/10 vs 7/10 wins for Russia, respectively, but that small difference is exploitable, as a high risk/high reward gamble, since the odds are set for 8/10 wins, not 7/10 wins, as my program predicts.

Let’s zoom in a bit closer to see why it might be that my program puts a bit higher probability to a draw than does Unibet:

RUSFRA_hist

Russia – France have met 4 times at world championship level since 2000, and Russia have won all but one, the 2013 game. Now, I have no insights into how Unibet’s odd compiler ranks this type of “anomaly”, but that single loss could be the difference in predictions.

Anyways, it’s interesting enough for me to put some money on that game, yes, the likelihood of winning the bet is fairly small – after all, my program thinks that RUS will beat FRA in more than 7 out of 10 games, but the upside is quite large, thus worth the calculated risk. As stated above, sports betting has loads of Unknown Unknowns, and who knows, perhaps I’m getting lucky….? 🙂

Posted in Bayes, Data Analytics, Data Driven Management, Gambling, HOCKEY-2018, Math, Numpy, Probability, PYMC, Python, Statistics | Tagged , , , , , , , , , , , | Leave a comment