Improving Python performance by factor of 100

I’m doing a bit of Machine Learning on my spare time. A neural network that analyzes images.

Anyways, since I don’t have zillions of computing power, any decent amount of learning data takes hours and days to process thru my (very modest, 101 x 10 x 1) neural network.

So today I spent a few hours looking for ways to improve performance. In my code, there was an obvious first candidate, the function that calculates new weights for the various level nodes in the network. Basically, with 101 input nodes, 10 hidden nodes, and 1 output node, and thus 101 hidden weights, each time going thru a learning iteration there are 101 x 100 nodes to process.

My initial attempt was to do the calculations in a nested for-loop. For a fairly limited set of learning data, and equally limited set of learning iterations, the program took in the order of 10h of computing.

Now I changed the implementation of the function computing the new weights, to skip iteration all together, instead using numpy’s powerful matrix manipulation. It took a while to figure out how to do it, but boy, the difference in execution time is striking: from 10 hours to 6 minutes, i.e. a performance boost of factor 100 !
To get a feel for what factor 100 means: apply that factor – either way – to your salary, monthly bills or daily commute time… 🙂

Code below.

def new_weights(input,hidden,output,hw,ow,error,mu):

for h,hval in np.ndenumerate(hidden):
for i,ival in np.ndenumerate(input):
slope_o = output * (1 - output)
slope_h = hidden[h] * (1 - hidden[h])
dx3dw = input[i] * slope_h * ow[0][h] * slope_o
hw[h,i] += dx3dw * error * mu

for h,hval in np.ndenumerate(hidden):
slope_o = output * (1. - output)
dx3dw = hidden[h] * slope_o
ow[0][h] += dx3dw * error * mu
return hw,ow

def new_weights2(input,hidden,output,hw,ow,error,mu):

slope_o = output * (1 - output)
slope_h = np.array(hidden * (1 - hidden))
dx3dw = np.outer(input,slope_h) * ow * slope_o
dx3dw = dx3dw.transpose()
hw += dx3dw * error * mu
dx3dw0 = np.outer(hidden,slope_o)
dx3dw0 = dx3dw0.transpose()
ow += dx3dw0 * error * mu
return hw,ow


About swdevperestroika

High tech industry veteran, avid hacker reluctantly transformed to mgmt consultant.
This entry was posted in AI, Complex Systems, development, Machine Learning, Neural networks, performance, software and tagged , , , , , , . Bookmark the permalink.

2 Responses to Improving Python performance by factor of 100

  1. Mikael V says:

    Being able to optimise algorithms is good, but eventually you will experience that your data set simply is too big and complex. Then you have to look into something like Apache Spark to distribute your workload over multiple processors/nodes. IBM Data Science Experience is a cloud based solution with support for this as well as Python, Jupyter Notebooks, Numpy, Tensorflow and a lot of other frameworks.

  2. Joe Marasco says:

    This is a comment on the original post by Tommy on optimizing the algorithm.

    Your improvement is reminiscent of the observation frequently made by experimental physicists, that one can often save an afternoon’s research in the library with only six month’s effort in the lab. Yes, optimizing algorithms can provide huge benefits. I assume that you compared the six-minute result to the ten-hour result to verify that they were the same!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s