The Metropolis-Hastings MCMC algorithm, implemented in Python

I’ve been using MCMC for quite a few years now, but always as a “Black Box”, that is, I’ve used tools such as PYMC or Stan, that implement different MCMC algorithms. Until now, I’ve never opened up the Black Box, I’ve just used the cool functionality without much thought on how it actually works. That is, my usage of MCMC has thus far been at a “high level of abstraction”.

But now is the time to open up that Black Box of MCMC, to go down a few levels of abstraction, to see if we can understand how that Black Box actually works. More specifically, we’ll dive into a flavor of MCMC called “Metropolis-Hastings”, and see if we can figure out how it works, and whether we code an implementation of that algorithm, in Python.

Those of you who like to understand a bit more about the history of Metropolis-Hastings (a very interesting and important history!) can either google it, or have a look here.

The problem that MCMC attempts to solve, it to figure out an unknown probability distribution by “walking around” that distribution. That is, we have somehow gotten a probability distribution that we know nothing about, and we want to figure out what that distribution looks like, and we do that, amazingly, by randomly “walking around” that distribution, and recording (sampling) the “stuff” we stumble into.

It’s really a bit like walking around in a pitch black room we know nothing about, and periodically flipping a coin to determine which way to go next, and “feeling” your way around the room, hoping to gain an understanding of the layout of that room.

Enough of background:

It turns out that at its most basic level, the Metropolis-Hastings algorithm is amazingly simple. here it is:

### Metropolis-Hastings MCMC algorithm

# 'target' is a probability distribution, that is, a function that sums to 1,
# here implemented as a Python function.
# 

size = 10000 # length of MCMC random sampling

walk = np.zeros(size) # array of samples 

all_proposed = np.zeros(size)
all_current = np.zeros(size)

walk[0] = 3 #initialize first step with dummy value to get MCMC walk started

# the random walk
for i in range(1,size):
    current = walk[i-1]
    all_current[i] = current
    proposed = current + pm.rnormal(0,1/ 1**2)
    all_proposed[i] = proposed
    
    A = target(proposed) / target(current) # accept ?
    
    # ratio above expresses ratio of probabilities for proposed outcome vs current outcome, accoriding to distribution
    # if ratio > 1 : accept always. if ratio < 1, accept if ratio > random number 0..1
    
    # For proposals outside domain of independent variable, i.e the outcome, for instance, if current is 5, 
    # that is, domain max of independent variable , and proposed is 6,
    # then, since all values outside of domain for the independent have p==0, that proposal is not accepted. 
    # This can be seen in the trace plot, where proposals occur outside of domain [0..5], but are never accepted. 
    
    if pm.runiform(0,1) < A : 
        walk[i] = proposed # accept
    else:
        walk[i] = current

That’s it! Amazing!

The array named “walk” will, after excution, contain samples from that unknown distribution, in proportion to their actual probability.

To better see what that means, let’s first look at the unknown distribution, i.e. our “dark room”, that we ahead of running Metropolis-Hastings know absolutely nothing about:

for us Unknown Probability Distribution (our “dark room”)

MH-homemade_dist

The above is a “home made” dummy distribution, i.e. not a Gaussian, not a Poisson, not a Binomial or anything else from the set of traditional probability distributions, but just a function where the sum of all possible outcomes adds up to 1, as must be the case for all probability distributions.

Of course, we could also have used one of the standard probability distributions as our unknown distribution, “dark room”, but I implemented my own, to better understand what actually goes on under the hood.

Now, what we want to do, is to learn the “layout” of that dark room, that is, we want to figure out the proportions of the different outcomes, and we do so by running MCMC, in this case the Metropolis-Hastings implementation above, and we obtain the below trace:

MH-random-sampling-walk

The x-axis represents each of the 10000 steps in our random walk, the y-axis represents the outcome at that step, i.e. the value sitting in that dark corner in our dark room where we now happen to be. The dark spots represent values (outcomes) that our algorithm has accepted as being representative for the unknown distribution, the (barely visible) orange spots represent “stuff” (values) that our algorithm has proposed. Some of those proposed will be accepted, others rejected.

As you can see, any proposed value under 0 or above 5 is rejected, because the domain of the outcomes is [0..5]. We can also see that high values (up to 5) seem to occur more frequently than lower values, which is what we should expect, given that the (for us unknown) probability distribution above has monotonically increasing probability for higher values.

But to really see that our algorithm has managed to track the actual probabilities from the unknown distribution, let’s plot a histogram over the outcomes:

MH-posterior-samples

This histogram shows the 6 possible outcomes, and their proportions. If you compare that profile to that of the unknown distribution, you can see that the Metropolis-Hastings algoritm has been able to figure out the proportions of the unknown distribution very well, and what’s amazing to me, it has done so without any prior / additional information – it has managed to do that by simply randomly walking around in that unknown distribution!

Very cool!

Finally, let’s limit the number of steps of the random walk to just a handful to see how the accept/reject part of the algorithm works:

MH-random-sampling-walk

Here we have 20 steps. The first blue dot on the left, at 0,3 is a “dummy” that I’ve inserted as the initial value, at step 0. For the next 5 steps, the algorithm accepts the proposal of new “position”, as shown by the dots residing on the line. However, at the next 2 steps, the algorithm rejects the proposals, and stays at the current position (3).

Way cool!

The final output we want, that is, the knowledge of the proportions of the different outcomes, is just the ratio of the accepted proposals.

[ Inspiration to this post came from https://stephens999.github.io/fiveMinuteStats/MH_intro.html ]

About swdevperestroika

High tech industry veteran, avid hacker reluctantly transformed to mgmt consultant.

View all posts by swdevperestroika →