I’m doing a bit of Machine Learning on my spare time. A neural network that analyzes images.

Anyways, since I don’t have zillions of computing power, any decent amount of learning data takes hours and days to process thru my (very modest, 101 x 10 x 1) neural network.

So today I spent a few hours looking for ways to improve performance. In my code, there was an obvious first candidate, the function that calculates new weights for the various level nodes in the network. Basically, with 101 input nodes, 10 hidden nodes, and 1 output node, and thus 101 hidden weights, each time going thru a learning iteration there are 101 x 100 nodes to process.

My initial attempt was to do the calculations in a nested for-loop. For a fairly limited set of learning data, and equally limited set of learning iterations, the program took in the order of 10h of computing.

Now I changed the implementation of the function computing the new weights, to skip iteration all together, instead using numpy’s powerful matrix manipulation. It took a while to figure out how to do it, but boy, the difference in execution time is striking: from 10 hours to 6 minutes, i.e. a performance boost of factor 100 !

To get a feel for what factor 100 means: apply that factor – either way – to your salary, monthly bills or daily commute time… 🙂

Code below.

def new_weights(input,hidden,output,hw,ow,error,mu):

```
``` for h,hval in np.ndenumerate(hidden):

for i,ival in np.ndenumerate(input):

slope_o = output * (1 - output)

slope_h = hidden[h] * (1 - hidden[h])

dx3dw = input[i] * slope_h * ow[0][h] * slope_o

hw[h,i] += dx3dw * error * mu

for h,hval in np.ndenumerate(hidden):

slope_o = output * (1. - output)

dx3dw = hidden[h] * slope_o

ow[0][h] += dx3dw * error * mu

return hw,ow

def new_weights2(input,hidden,output,hw,ow,error,mu):

` slope_o = output * (1 - output)`

slope_h = np.array(hidden * (1 - hidden))

dx3dw = np.outer(input,slope_h) * ow * slope_o

dx3dw = dx3dw.transpose()

hw += dx3dw * error * mu

dx3dw0 = np.outer(hidden,slope_o)

dx3dw0 = dx3dw0.transpose()

ow += dx3dw0 * error * mu

return hw,ow

Being able to optimise algorithms is good, but eventually you will experience that your data set simply is too big and complex. Then you have to look into something like Apache Spark to distribute your workload over multiple processors/nodes. IBM Data Science Experience is a cloud based solution with support for this as well as Python, Jupyter Notebooks, Numpy, Tensorflow and a lot of other frameworks.