## Accidental discovery of Central Limit Theorem by Monte Carlo Simulation

Continuing my exploration of the Python language, I decided to use it for a  very simple Monte Carlo simulation.  Why such an unusual domain, you might ask…? Well, to me the most difficult part in any learning effort is to identify problem areas which are of enough interest to spend some time and energy on, I’m simply unable to muster the energy to work on traditional programming exercises, such as ‘student registration’ or ‘store inventory’ – not even ‘storing mother’s recipies’ is exciting enough to spark a glimpse in my hacking eye…

But simulations are always fun, and also a very useful way to explore any non-deterministic, non-predictable domain, e.g such as our businesses or our socities, or any other type of complex system.

So, I decided to visit Monte Carlo-land.

In order to keep the exercise simple enough for a Grumpy Old Man like myself to understand, I started with a problem that’s very easy to understand, but despite this, demonstrates some perhaps unexpected properties. The problem at hand is:

Imagine you are a product manager in some commodity business, responsible for a new product to be developed. Your boss asks you to estimate the market and the expected net profit for the product, say during the first year. Your boss needs that figure in order to size the factory, after all, in today’s lean world, we do not want any waste, do we ? And having a factory with capacity exceeding demand is clearly vaste, right…?

How would you go about that problem ? One attempt might be that you estimate parameters such as demand, price point, unit cost, fixed costs etc, and if you are smart, you probably also include some variance into your numbers, i.e. a distribution instead of a single number.

But your boss, the CEO of the business, doesn’t want to hear about any ‘it could be this, or it could be that’-kind of vague answers, he wants a single number. So, you give him a number, which is the average of the “low”, “medium” and “high”  demand scenarios you had in your excel chart.

The problem with this approach comes from the Flaw of Averages, basically because averages hide the variability in the underlying data.

So, a better method is to use a simulation.

I started with this simple formula for the expected net profit:

profit = demand *  price – (units * unitcost +  fixedcost)

Since I’ve got no MBA (thank God!) I’m not sure if these are the correct terms the bean counters use, but I hope they are understandable anyway)

Basically, the profit is the income – the costs ( Truly Brilliant! 🙂

As you can see, as the formula stands, it’s fully deterministic, as soon as we plug in the numbers. However, we all know – perhaps with the exception of MBA’s and politicians – that there is bound to be some variance in those numbers. So, for each factor above, I introduced such a variance, by means of a random number generator, which allowed me to set a range factor for each parameter, e.g. demand could be 0.65 to 1.30 of my initial estimate.

I added such a factor for each parameter in the model (the formula above), and ran a simulation with 100.000 iterations, with each iteration using randomly generated variance factors.

The first such run I did, I only applied the variance factor to to one of the model parameters, the demand.  Subsequent runs increased the number of parameters with variance, until I after four runs had a distribution guiding all the four parameters in the model.

Below are graphs of the four runs:

See something interesting with the graphs…?

The random number generator is (supposedly) generating a uniform distribution, that is, each of the numbers in the interval is equally likely to be picked. Therefore, the first graph, which shows the results of the first run where only a single parameter (demand) was given by a distribution instead of a number, has a fairly uniform outcome.

But look at the shape of the subsequent graphs, particularly the last one… looks pretty familiar in shape, doesn’t it…? Yep, it does indeed look like the dreaded Gauss Curve, a.k.a the Bell Curve, that is, the Normal Distribution!

Interesting, isn’t it, using a uniform distribution results in a normal distribution…!

I haven’t done formal stats in ages, but I had a vaque notion back of my head about something called Central Limit Theorem, which I really never bothered to fully understand back in school (the statistics department was such a boring place!)  but now it seems that I accidentally stumbled upon it in my Python explorations!  Cool! It’s never too late to learn something!

For those of you who might want to know more about Monte Carlo, have a look at Chris Stucchio’s brilliant article about how to apply the technique for political decision making.