Improving Python performance by factor of 100

I’m doing a bit of Machine Learning on my spare time. A neural network that analyzes images.

Anyways, since I don’t have zillions of computing power, any decent amount of learning data takes hours and days to process thru my (very modest, 101 x 10 x 1) neural network.

So today I spent a few hours looking for ways to improve performance. In my code, there was an obvious first candidate, the function that calculates new weights for the various level nodes in the network. Basically, with 101 input nodes, 10 hidden nodes, and 1 output node, and thus 101 hidden weights, each time going thru a learning iteration there are 101 x 100 nodes to process.

My initial attempt was to do the calculations in a nested for-loop. For a fairly limited set of learning data, and equally limited set of learning iterations, the program took in the order of 10h of computing.

Now I changed the implementation of the function computing the new weights, to skip iteration all together, instead using numpy’s powerful matrix manipulation. It took a while to figure out how to do it, but boy, the difference in execution time is striking: from 10 hours to 6 minutes, i.e. a performance boost of factor 100 !
To get a feel for what factor 100 means: apply that factor – either way – to your salary, monthly bills or daily commute time… 🙂

Code below.

 
def new_weights(input,hidden,output,hw,ow,error,mu):

for h,hval in np.ndenumerate(hidden):
for i,ival in np.ndenumerate(input):
slope_o = output * (1 - output)
slope_h = hidden[h] * (1 - hidden[h])
dx3dw = input[i] * slope_h * ow[0][h] * slope_o
hw[h,i] += dx3dw * error * mu

for h,hval in np.ndenumerate(hidden):
slope_o = output * (1. - output)
dx3dw = hidden[h] * slope_o
ow[0][h] += dx3dw * error * mu
return hw,ow

def new_weights2(input,hidden,output,hw,ow,error,mu):

slope_o = output * (1 - output)
slope_h = np.array(hidden * (1 - hidden))
dx3dw = np.outer(input,slope_h) * ow * slope_o
dx3dw = dx3dw.transpose()
hw += dx3dw * error * mu
dx3dw0 = np.outer(hidden,slope_o)
dx3dw0 = dx3dw0.transpose()
ow += dx3dw0 * error * mu
return hw,ow

 

About swdevperestroika

High tech industry veteran, avid hacker reluctantly transformed to mgmt consultant.
This entry was posted in AI, Complex Systems, development, Machine Learning, Neural networks, performance, software and tagged , , , , , , . Bookmark the permalink.

One Response to Improving Python performance by factor of 100

  1. Mikael V says:

    Being able to optimise algorithms is good, but eventually you will experience that your data set simply is too big and complex. Then you have to look into something like Apache Spark to distribute your workload over multiple processors/nodes. IBM Data Science Experience is a cloud based solution with support for this as well as Python, Jupyter Notebooks, Numpy, Tensorflow and a lot of other frameworks.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s