Python matplotlib, Monte Carlo simulation, and basic statistics

Image

Having spent some evenings with Monte Carlo simulations with Python, where I cheated the plotting part using excel on the files I generated from Python, I decided to bite the bullet and install Pythons’ huge matplotlib. I had tried to do so a couple of times before, but the build had always failed due to some compilation error (the matplotlib seems to be implemented in C) and I didn’t want to spend time fixing the problem. But today, inspired by Chris Stucchio’s example, I decided to give it another try.

Checking out the info on http://matplotlib.org/users/installing.html, I noticed that matplotlib might demand a windows dll, msvcp71.dll, which I duly downloaded and installed. Then, I did a source install of matplotlib from Git.

This time, the build succeded, and I’m now able to use all the powerful functions of matplotlib for generating graphs, no need for excel anymore.

The first experiment was to use matplotlib to plot a normal distribution graph, and a uniform distribution graph, the result of that exercise is above.

The next experiment was to do a simple Monte Carlo simulation, simulating two sports teams, with identical performance capacity in three areas, but where the variability differs slightly:

teamperfA = [0.9,0.9,0,8]
teamperfB = [0.9,0.9,0.8]
teamvarianceA = [0.02,0.05,0.01]
teamvarianceB = [0.05,0.1,0,04]

Basically, the two teams have identical capacity in all three skills dimensions of interest, but team B has a slightly higher variance in each dimension.

For each skills dimension, the variability number becomes the standard deviation for the variable, which has a mean of 1. That is, each performance factor varies by a normal distribution with mean 1, and standard deviation as given in the table above.

Plugging in these numbers into a very simple formula simulating the result of 10000 competitions between the two teams resulted in the graph below:

Image

As can be seen, team A is much more consistent in their results, with less variability on either side of their results, while team B appears to be more of ‘risktakers’, where the sometimes perform very badly, and sometimes very well. 

The point of this little exercise was to learn a bit about matplotlib, and to extend my knowledge of Python.  I must say I’m deeply impressed by Python, and the ‘eco-system’ it has: in less than perhaps 10h in total, spread over the past few weeks, I’ve been able to do some pretty complex tasks with Python, ranging from large scale data collection and analysis for social network analysis, over analysis of NMEA and AIS-messages, to working with the math, scientific,stats and plotting capabilities of the language.

The power of this type of actively supported open source technology, with an active community, is simply enourmous: as soon as I couldn’t figure out how to do something, a simple google immediately gave the answer, regardless of the problem was where to find a suitable module, or how to do a particular type of programming task in Python.

I doubt few if any commercial software products have this type of very active eco-system available.

from pylab import *
from scipy.stats import *

runs = 10000

teamperfA = [0.9,0.9,0,8]
teamperfB = [0.9,0.9,0.8]
teamvarianceA = [0.02,0.05,0.01]
teamvarianceB = [0.05,0.1,0,04]

weights = [1.0,0.95,0.8]

def result(perf,variance,weights):
res = 0.0
for i in range(len(perf) – 1) :
res += perf[i] * weights[i] * norm(1,variance[i]).rvs()

return res

resultsA = zeros(shape=(runs,), dtype=float)
resultsB = zeros(shape=(runs,), dtype=float)

for i in range (runs):
resultsA[i] = result(teamperfA,teamvarianceA,weights)
resultsB[i] = result (teamperfB,teamvarianceB,weights)

subplot(211)
width = 2
height=runs

title(“Team A”)
hist(resultsA, bins=50)
axis([1.5,2.0,0,height/10])

subplot(212)
title(“Team B”)
hist(resultsB, bins=50)

axis([1.5,2.0,0,height/10])

show()

Advertisements

About swdevperestroika

High tech industry veteran, avid hacker reluctantly transformed to mgmt consultant.
This entry was posted in development, Math, software and tagged , , , , . Bookmark the permalink.

2 Responses to Python matplotlib, Monte Carlo simulation, and basic statistics

  1. Gene says:

    Absolutely love this example as someone following your foot prints, am learning Python & MatplotLib.

    Quick question.

    I’m getting a Python 2.7 syntax error on the “range” line item within the function you listed:
    —–Begin Code Snippet—-
    def result(perf,variance,weights):
    res = 0.0
    for i in range(len(perf) – 1):
    res += perf[i] * weights[i] * norm(1,variance[i]).rvs()
    return res
    —–End Code Snippet—-

    Comments, please.
    Thank you.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s