I’m flabbergasted….! In less than 70 lines (seven-zero!) of Python/KERAS-code, and after a training session of about 20-30 min on my laptop, the CNN is able to correctly identify 99.2% of the images….!

These really powerful libraries are way cool – almost to the point where they take the fun & challange out of it…!

import numpy as np import matplotlib.pyplot as plt np.random.seed(4711) # for reproducibility from keras.models import Sequential from keras.layers import Dense, Dropout, Activation, Flatten from keras.layers import Conv2D, MaxPooling2D from keras.utils import np_utils from keras.datasets import mnist from keras import backend as K K.set_image_dim_ordering('th') (X_train, y_train), (X_test, y_test) = mnist.load_data() X_train = X_train.reshape(X_train.shape[0], 1, 28, 28) X_test = X_test.reshape(X_test.shape[0], 1, 28, 28) X_train = X_train.astype('float32') X_test = X_test.astype('float32') X_train /= 255 X_test /= 255 print ('X_train',X_train.shape) print ('X_test',X_test.shape) Y_train = np_utils.to_categorical(y_train, 10) Y_test = np_utils.to_categorical(y_test, 10) model = Sequential() model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(1,28,28))) model.add(Conv2D(32, (3, 3), activation='relu')) model.add(MaxPooling2D(pool_size=(2,2))) model.add(Dropout(0.25)) model.add(Flatten()) model.add(Dense(128, activation='relu')) model.add(Dropout(0.5)) model.add(Dense(10, activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) history = model.fit(X_train, Y_train, batch_size=32, epochs=10, verbose=1) model.save('mnist-model.h5') score = model.evaluate(X_test, Y_test, verbose=1) print (score) plt.figure(figsize=(18,12)) plt.title('Neural Network Training Progress') plt.grid(which='both') plt.xlabel('Epoch (60000 iterations each)') plt.ylabel('percentage images correctly identified') plt.plot(history.history['acc']) plt.savefig('keras_cnn_example.jpg',format='jpg') plt.show()

]]>

32 teams, 64 games. 3 different ranking models tested, FIFA’s official, a “wisdom-of-crowds” (static), and a dynamic version of the wisdom-of-crowds model.

Prediction results: 67% of game outcomes correctly predicted.

Betting results best strategy (max probability), with uniform betting on all 64 games : 32% profit (as measured over overall stake)

Notable: all three medalists, France, Croatia, Belgium are “outliers”, i.e. outside of a confidence interval of 89%, in all three ranking schemes.

]]>In order to figure out how the NMEA-WiFi Gateway deals with clients, e.g. if it expects any “handshake” or any other communication setup protocol, I decided to write a simulator mimicing the gateway, and then using iRegatta 2 from Zifago to verify that it can read the simulated NMEA messages sent by my “soft” gateway.

So, in the video above, my laptop (on the right) is pretending to be the NMEA-WiFi gateway, constantly broadcasting UDP packages containing NMEA sentences onto the network, and on the left my iPad running iRegatta 2 is collecting them and displaying the information obtained from the sentences.

With the communication between the Gateway and its clients now figured out, I’m able to collect full race data, including multiday races, from all the instruments onboard onto my laptop for after race “post mortem” race performance analysis.

]]>Below an example of parsing a handful of the message types, on a Garmin network (note: the data collection was done with the boat stationary, thus there’s no interesting info about speed, vmg etc in these graphs, that will have to wait until we are actually sailing…

The first graph is a frequency plot over the various NMEA message types – the most frequent messages are (remember, the boat was stationary):

IIHDT – heading T

GPWCV – waypoint closure velocity

GPZDA – time & date

IIVWR – AWA AWS

IIVTG – track made good and ground speed

GPXTE – xross track error

IIVPW – speed parallel to wind

IIDBT – depth below transducer

GPGSV – satellites in view

IIHDM – magnetic heading

IIWHW – Speed thru water

GPGLL – lat & long

IIMWD – wind direction & speed

GPRMC – recommended minimum data

The next plot shows the true wind speed and angle over a period of time.

The last graph shows two polar plots over true and apparent wind.

Finally, an idea for the main performance analysis screen:

import re import pandas as pd import matplotlib.pyplot as plt import numpy as np import datetime as dt import matplotlib.dates as pldt from collections import Counter def parse_time(sentence): pattern = r'[0-9]+-[0-9]+-[0-9]+ [0-9]+:[0-9]+:[0-9]+,[0-9]+' result = re.match(pattern,sentence) if result: result = result.group().split(' ') return result return None ### VMG ### def parse_iivpw(sentence): pattern = r'\$IIVPW,[0-9]+\.[0-9]+,N' result = re.findall(pattern,sentence) if result: result = result[0] result = result.split(',') return result return None ### TWA, TWS ### def parse_iivwt(sentence): pattern = r"\$IIVWT,[0-9]+,(?:L|R),[0-9]+\.[0-9]+,N,[0-9]+\.[0-9]+,M" result = re.findall(pattern,sentence) if result: result = result[0] result = result.split(',') return result return None ### AWA, AWS ### def parse_iivwr(sentence): pattern = r"\$IIVWR,[0-9]+,(?:L|R),[0-9]+\.[0-9]+,N,[0-9]+\.[0-9]+,M" result = re.findall(pattern,sentence) if result: result = result[0] result = result.split(',') return result return None ### STW ### def parse_iivhw(sentence): pattern = r'\$IIVHW,,T,,M,[0-9]+\.[0-9]+,N,[0-9]+\.[0-9]+,K' result = re.findall(pattern,sentence) if result: result = result[0] result = result.split(',') return result def prefix(sentence): pattern = r'\$[A-Z]+,' result = re.findall(pattern,sentence) if result: result = result[0] result = result.split(',') return result[0] return None def degtorad(twa,side): if side == 'L': twa = twa * -1 return np.deg2rad(twa) def strip_leading_zero(s): if s[0] == '0': return s[1:] all_iivwr = [] all_iivwt = [] all_iivhw = [] all_iivpw = [] all_prefixes = [] f = open ('udp_rx.log','r') nr_lines = 0 for line in f: already_parsed = False time = line.split()[1] pf = prefix(line) if pf != None: all_prefixes.append(pf) msg = parse_iivwr(line) if msg: msg.append(time) all_iivwr.append(msg) already_parsed = True if not already_parsed: msg = parse_iivwt(line) if msg: msg.append(time) all_iivwt.append(msg) already_parsed = True if not already_parsed: msg = parse_iivhw(line) if msg: msg.append(time) all_iivhw.append(msg) already_parsed = True if not already_parsed: msg = parse_iivpw(line) if msg: msg.append(time) all_iivpw.append(msg) nr_lines += 1 prefix_counts = Counter(all_prefixes) iivpw_df = pd.DataFrame(all_iivpw) iivpw_df.columns = ['TYPE','VMG','N','TIME'] iivpw_df.drop('N',inplace=True,axis=1) iivpw_df['TIME'] = iivpw_df['TIME'].astype(dt.datetime) iivpw_df.set_index('TIME',inplace=True) #remove empty elements in sublists all_iivhw = [ [item for item in sublist if item] for sublist in all_iivhw] iivhw_df = pd.DataFrame(all_iivhw) iivhw_df.columns = ['TYPE','T','M','KNOTS','N','KM','K','TIME'] iivhw_df['KNOTS'] = pd.to_numeric(iivhw_df['KNOTS']) iivhw_df['KM'] = pd.to_numeric(iivhw_df['KM']) iivhw_df.set_index('TIME',inplace=True) iivhw_df.drop(['T','M','N','KM','K'],inplace=True,axis=1) iivwt_df = pd.DataFrame(all_iivwt) iivwt_df.columns = ['TYPE','TWA','SIDE','KNOTS','N','MS','M','TIME'] iivwt_df['TWA'] = iivwt_df['TWA'].apply( strip_leading_zero) iivwt_df['TWA'] = pd.to_numeric(iivwt_df['TWA']) iivwt_df['KNOTS'] = iivwt_df['KNOTS'].apply( strip_leading_zero) iivwt_df['KNOTS'] = pd.to_numeric(iivwt_df['KNOTS']) iivwt_df['MS'] = iivwt_df['MS'].apply( strip_leading_zero) iivwt_df['MS'] = pd.to_numeric(iivwt_df['MS']) iivwt_df['RAD'] = np.vectorize(degtorad)(iivwt_df['TWA'],iivwt_df['SIDE']) iivwt_df.set_index('TIME',inplace=True) iivwt_df.drop(['KNOTS','N','M'],inplace=True,axis=1) iivwr_df = pd.DataFrame(all_iivwr) iivwr_df.columns = ['TYPE','TWA','SIDE','KNOTS','N','MS','M','TIME'] iivwr_df['TWA'] = iivwr_df['TWA'].apply( strip_leading_zero) iivwr_df['TWA'] = pd.to_numeric(iivwr_df['TWA']) iivwr_df['KNOTS'] = iivwr_df['KNOTS'].apply( strip_leading_zero) iivwr_df['KNOTS'] = pd.to_numeric(iivwr_df['KNOTS']) iivwr_df['MS'] = iivwr_df['MS'].apply( strip_leading_zero) iivwr_df['MS'] = pd.to_numeric(iivwr_df['MS']) iivwr_df['RAD'] = np.vectorize(degtorad)(iivwr_df['TWA'],iivwr_df['SIDE']) iivwr_df.set_index('TIME',inplace=True) iivwr_df.drop(['KNOTS','N','M'],inplace=True,axis=1) print (iivhw_df.head()) print (iivwt_df.head()) print (iivwr_df.head()) print (iivpw_df.head()) print ('max TWS:',iivwt_df['MS'].max(), 'm/s') print ('min TWS:',iivwt_df['MS'].min(), 'm/s') print ('max TWA:',np.rad2deg(iivwt_df['RAD'].max()), 'deg') print ('min TWA:',np.rad2deg(iivwt_df['RAD'].min()), 'deg') print ('max AWS:',iivwr_df['MS'].max(), 'm/s') print ('min AWS:',iivwr_df['MS'].min(), 'm/s') print ('max AWA:',np.rad2deg(iivwr_df['RAD'].max()), 'deg') print ('mix AWA:',np.rad2deg(iivwr_df['RAD'].min()), 'deg') print ('max STW:',iivhw_df['KNOTS'].max(),'knots') print ('min STW:',iivhw_df['KNOTS'].min(),'knots') print ('max VMG:',iivpw_df['VMG'].max(),'knots') print ('min VMG:',iivpw_df['VMG'].min(),'knots') joint = pd.concat([iivwt_df,iivhw_df,iivwr_df,iivpw_df],axis=0) joint.sort_index(inplace=True) joint = joint[['TYPE','KNOTS','MS','TWA','SIDE','VMG']] cols = ['TYPE','BOAT_SPEED','WIND_SPEED','WIND_ANGLE','SIDE','VMG'] joint.columns = cols joint.index = joint.index.astype(dt.datetime) print (joint.head()) print ('number of true wind data:',len(iivwt_df)) print ('number of apparent wind data:',len(iivwr_df)) print ('number of STW data:',len(iivhw_df)) print ('number of VMG data:',len(iivpw_df)) print ('total number of sentences:',nr_lines - 3) ### PATTERN ### #pfix_labels,pfix_values = zip(*prefix_counts.items()) ### foo = prefix_counts.most_common() pfix_labels,pfix_values = zip(*foo) plt.figure(figsize=(18,12)) ax = plt.subplot(121,polar=True) ax.set_theta_zero_location("N") ax.set_theta_direction(-1) ax.set_title('TWA & TWS [m/s]') tw_thetas = iivwt_df['RAD'] tw_radi = iivwt_df['MS'] aw_thetas = iivwr_df['RAD'] aw_radi = iivwr_df['MS'] tws_max = max(tw_radi) aws_max = max(aw_radi) max_radi = max(tws_max,aws_max) ax.set_ylim(0,np.ceil(max_radi)) tw = ax.scatter(tw_thetas,tw_radi,marker='.', facecolors='none',c=tw_radi,cmap='jet',alpha=0.7) tw_cbar = plt.colorbar(tw) tw_cbar.set_label('TWS [m/s]') ax = plt.subplot(122,polar=True) ax.set_theta_zero_location("N") ax.set_theta_direction(-1) ax.set_title('AWA & AWS [m/s]') ax.set_ylim(0,np.ceil(max_radi)) aw = ax.scatter(aw_thetas,aw_radi,marker='|',c=aw_radi,cmap='jet',alpha=0.7) aw_cbar = plt.colorbar(aw) aw_cbar.set_label('AWS [m/s]') plt.tight_layout() plt.savefig('nmea_wind_polar.jpg',format='jpg') plt.figure(figsize=(18,12)) ax1 = plt.subplot(2,1,1) plt.title('TWS') every_nth = 100 plt.grid(which='major') ax1.plot(iivwt_df.index,iivwt_df['MS']) for n,label in enumerate(ax1.xaxis.get_ticklabels()): if n % every_nth !=0: label.set_visible(False) ax1.set_ylabel('TWS [m/s]') ax2 = plt.subplot(2,1,2,sharex=ax1) plt.title('TWA') plt.grid(which='major') plt.plot(iivwt_df.index,np.rad2deg(iivwt_df['RAD'])) for n,label in enumerate(ax2.xaxis.get_ticklabels()): if n % every_nth !=0: label.set_visible(False) ax2.set_ylabel('TWA [deg]') plt.savefig('nmea_TW_timeline.jpg',format='jpg') plt.figure(figsize=(18,12)) plt.title('NMEA message frequency on Garmin network') plt.grid(which='both') plt.ylabel('Message count') plt.xlabel('Message prefix') xticks = pfix_labels plt.bar(range(len(pfix_values)),pfix_values) plt.xticks(range(len(pfix_values)),xticks,rotation='vertical') plt.savefig('nmea_prefixes.jpg',format='jpg') plt.show()

]]>

— Läs på www.quantamagazine.org/new-theory-cracks-open-the-black-box-of-deep-learning-20170921/ ]]>

Applying scientific betting on the just finished Ice Hockey World Championships. By “scientific”, I’m referring to the exclusive use of statistical and mathematical models, simulation, and probabilistic programming, more specifically, Markov Chain Monte Carlo and Bayesian Inference.

For this experiment, wrote a Python based Bayesian Inference Engine, which fundamentally uses two main sources of information: historical game results and the official IIHF rankings table. Based on those two sources of information, the engine will come up with probalities for the outcomes of each of the 64 games in the tournament.

It is also important to state that I know absolutely nothing about contemporary ice hockey, I don’t know the teams nor the players,I have not been watching ice hockey for at least 15 years, my interest in the game is zip & nada, so my own knowledge about ice hockey is pretty much zero. That means that I have not used any other information in my experiment, than the historical game outcome data and the official rankings. The predictions, and the betting resulting from those predictions are thus solely based on what the inference engine predicts in terms of outcomes.

So, with these premises, how did the experiment go ? Is it possible, even without any own interest or knowledge about a sport, to make a living as a professional gambler…?

The tournament consisted of 16 teams and 64 games. My inference engine predicted the outcomes of each of these games, and I placed bets, one or more, on most of the games.

Withouth revealing the actual sums involved, here’s the results, both financial and for the predictions:

- Return on Investment: 21%
- Net Profit Margin: 5%
- Game Outcome Prediction Success Rate: 53% [to be compared to Unibet’s 58%]
- Ratio successful/failed bets: 27%
- Average betting odds: 6.5

So, over the past couple of weeks, I made 21% profit on top of my investment. From a financial perspective, that’s an ok ROI, and hadn’t this been an experiment, but a real attempt to make money, the profit would very likely been (much) higher: now, because of the experimental status, I’ve placed many “stupid” bets in order to see what happens, bets which I wouldn’t have placed in a production scenario, where the sole objective would be to make money, not making experiments.

What about the performance of the Python based home made inference engine, then…? Actually, it performed much better than I hoped for: until the last few somewhat surprising games in the final rounds, the hit rate was around 63%, with Unibet at the same time being at 67%. Due to a couple of surprising outcomes, e.g. FIN-SUI and CAN-USA, my hit rate dropped to 53%, compared to Unibet’s final 58%. So, my home made program performed about 5 pct points worse than the prediction engine of a professional, huge betting shop, with zillions of computing power as well as zillions of experts, studying and knowing every aspect of each team and game. Better yet, the engine performed well enough to allow me exit the experiment with a 21% ROI, i.e “winning over the betting house”, which is the real measure of success/failure.

So, in summary, to answer the question “Is it possible to make a living as a professional gambler using scientific methods ?”

My answer, after this experiment is YES. And the more you are interested in the game, the more you know about the game, the better “odds” you’ll have for being successful. But this experiment has demonstrated that by appying scientific methods, you can be fairly successful even if you know absolutely nothing about the game, nor have any interest in it.

]]>

The tournament is starting tomorrow with four games. From now on, future posts on this topic on the public Facebook group Scientific Gambling on Ice Hockey World Championships 2018 only.

So, I you want to continue following how my Bayesian Inference engine performs in its attempts to scientifically predict the outcomes of the games, and how much money I’m going to win – or lose – on my gambling, check out the FB-group.

]]>Let’s use two very simple games to illustrate how this works, tossing coins and throwing dice.

Let’s start with a game of coin toss, assuming a fair coin, that is, the probability of getting heads or tails is fifty-fifty, that is, 0.5. The corresponding ‘straight’ odds for this game are thus 2 (decimal format), or 1:1 (fractional format). By ‘straight’, I mean odds that are directly given by the outcome probabilities. However, no professional betting shop can issue straight odds, if they would do so, they would very soon go out of business, simply because playing with straight odds is a zero sum game, that is, in the long run, the expected value of the game is zero, for both the player and for the house.

How to see that…? Let’s pretend we play coin toss repeatedly, very many times. In the long run, we should expect winning in half (50%) of the games, losing in the other 50% of the games. As an example, let’s say we play 100 times, the stake per game is 1$, and we have fair odds, that is 2 (inverse of 50% probability). So, these 100 games, where each game costs 1$, will cost us 100$. Statistically, we should win 50 times, lose 50 times. To make this a fair game, i.e. a zero sum game, we need to win back the cost of all the 100 games, that is 100$ within the 50 winning games. Dividing 100$ by 50 wins means that each win should return 2$, what is what will happen if we have straight odds set, i.e odds of 2 for this game.

So, in order to make money, the betting shops do not use straight odds, but instead ‘salt’ the odds a tiny bit in their favor, to ensure a healthy margin. Let’s markup the probabilities by 5%, i.e. applying a markup of 1.05 to the odds. Now, and instead of odds 2, we have odds 1.90 for both outcomes. Still, assuming a fair coin, we can expect winning 50% of the time. So, again, with 100 games, we should expect to win 50. The cost of the 100 games is still 100$, but the returns now are not 50 * 2 == 100, but 50 * 1.90 == 95$. That is, in the long run, we should expect to lose 5% of our stake. That markup of of the odds, here 5% , thus generates a house advantage of 5%, which is the reason for being for any professional betting shop or casino.

For the game of throwing dice the same thing applies: in dice throwing, the probability of a fair dice to land on any of the 6 sides is 1/6, which gives that fair betting odds are 6 (decimal), or 5:1 (fractional). In order to make a profit, the betting shop must markup the odds, exactly as for the coin tossing example above.

Below two graphs shows simulations of 1.000.000 games of coin tossing and dice throwing, where the red lines show the cumulative results (total win/loss) after the 1.000.000 games. As can be seen, with fair odds both games are very close to zero sum, while with a 5% markup, the house makes a healthy profit of 5%, i.e. about 50.000 $.

]]>