## MCMC “Internals” – fitting the data

Lately, I’ve been puzzling over how exactly MCMC fits the data of the likelihood, that is, how exactly is that fitting done by the various tools implementing MCMC. So, this post is maily about MCMC Internals, not about how to use MCMC for modeling.

For a while ago, I wrote about the Metropolis-Hastings random walk, and showed how simple an implementation of it can be (at it’s most basic level, without any Bells&Whistles).

But now I wanted to understand how MCMC internally might perform the fitting the observed data to the Likelihood function. I have not been able to find any details about how actual, “Industrial Strength” MCMC-samplers such as Stan or PYMC does it, so here, I’m just trying out some “invented here” techniques that might (or not) be anywhere near whats done in professional implementations.

So we will hack a simulation, and see if we can implement an MCMC engine capable of even the most basic data fitting, basically without using anything other than “normal” Python libraries.

First we need data, and that’s simple, let’s just use some random number generator, and tell it that we e.g. want data based on a Poisson Distribution, with an Expected Value (EV) of 100 (for convenience, I’m using PYMC’s random number generator here, but you could use any one you like, such as Scipy’s):

### Simulated Data

```SIZE = 5 # number of data points
true_mu = 100

data = pm.rpoisson(mu=true_mu,size=SIZE)
```

So, the above code simulates the “True” Data, that is, data that we have observed in the real world, e.g. number of daily confirmed Corona cases. The code snippet above generates five such data points [94,93,100,114,97] in this particular case, with an EV of 100, and we can imagine these five data points representing the number of confirmed per day, over 5 days.

So, we can say that this snippet of code represents/ simulates our “Process under observation”, i.e. the real world.

Now, since we are simulating the data here, we *know* the true value of EV, implemented in the code as true_mu = 100. But in reality, that value is unknown to us, the only thing we can actually observe in our process is the 5 data points, nothing else. Therefore, we need some assumptions, one of them being that the Process could perhaps be modeled by a Poisson distribution, since we are dealing with counts.

To do so, using Bayesian Inference, we simply assume that the Process behind the data we observed is Poisson, that is, we believe that the data that we observe are being generated by a Poisson distribution with a for us unknown EV.

And we would like to find out what that EV might be.

To do that, we’ll implement our own MCMC Sampler, and use it to fit the data. And it’s this bit that I was curious about, how the fitting might be implemented.

I’ve been browsing the net for a while, without finding any good info on the topic, but a few days ago, I found these two links that provided some input on the problem, and what follows, is based on that input.

The overall strategy is:

• Assume the data observed is generated by some distribution, in our case, Poisson
• Define a Generator Function, that takes as input the EV, and produces data samples, equally many as in the observed data.
• Define an Error/Loss function, that computes how far from the generated data are from the observed data
• Define a Likelihood Ratio function, that computes the ratio of two errors
• Define a MCMC walker (Metropolis-Hastings), that repeatedly generates new proposals for the EV, and passes them to the Generator.

Basically, the core of the implementation is a loop, where the Metropolis-Hastings algorithm repeatedly generates new proposals for the EV, calls the Generator to obtain samples based on that EV, calls the Error/Loss-function to compute the error of the sample, and computes the ratio of EV_proposed vs EV_current, using the Likelihood Ratio function and based upon that ratio, decides whether or not to accept the newly proposed EV as the next state.  So, this sequence of events basically constitutes the Random Walk, which ultimately will reflect the distribution of the unknown parameter, in our case, the EV of the Poisson Distribution.

Simple, right…?! 😉

Perhaps it becomes clearer if we look at some code:

### The Generator

```def generator(param,size=SIZE):
return pm.rpoisson(mu=param,size=size)
```

Nothing fancy here, just a function that for convenience uses PYMC’s random number generator to provide random Poisson numbers, 5 of them in our case.

Observe the parameter ‘param’ in the signature: it’s the value – or more correctly – the distribution of possible values of that parameter – given the data that we have observed – that we are really after here.

### The Loss/Error Function

```def X2(data,generated,sigma=0.3) :
return  ( ( (data - generated) ** 2 ) / (2 * sigma ** 2) ).sum()
```

The error function is a version of Squared Error Sum. Note the ‘sigma’ parameter in the signature – in order to make a ‘real’ MCMC sampler, and not a toy, as I’m doing here, I do believe there must be some intelligent relationship between the data, the error and that sigma parameter.  But for our purposes of a simplistic demo of a toy MCMC internals, I’m not going into details about it, more than saying that the value of sigma is important for many things – set it too small, and your runtime will blow up, set it too large and *any* proposed EV, regardless of being plausible or not, will be accepted.

The same thing goes with the step variable in the Metropolis-Hastings algorithm – I’m sure that in order to do a good sampling, the step size must be dynamically tuned, something we will not do here either.

### The Error Ratio function

```def likelihood_ratio(X2_current,X2_proposed):
return np.exp(-X2_proposed + X2_current)

```

This Error Ratio function is, together with the Error function, perhaps the most critical parts of the implementation, since they in conjuction are responsible for providing input to the process of determining whether a new proposed value for the walk will be accepted or not.  The params represent the errors for the current and proposed EV values of the walk, provided by the Metropolis-Hastings algorithm.    Basically, there are 3 types of outcomes to consider:

• Error_Proposed == Error_Current
• Error_Proposed  < Error_Current
• Error_Proposed > Error_Current

In the first case, the Error Ratio function returns 1. In the second case, it returns a value greater than 1. In the third case, it returns a value less than 1.

This will become important when we look at the accept/reject-part of the Metropolis-Hastings algorithm:

### Metropolis-Hastings

```steps = 50000 # length of MCMC random sampling walk

burn = steps // 2

walk = np.zeros(steps) # array of samples

all_proposed = np.zeros(steps)
all_current = np.zeros(steps)

accepted = 0
rejected = 0

walk = 10
#init first step with dummy value for param to get
#MCMC walk started. Good value gives better convergence

error_sigma = 3

# the random walk
for i in range(1,steps):

current = walk[i-1]
all_current[i] = current

random_step = pm.rnormal(0, 1 / 1 ** 2)
proposed = current + random_step

all_proposed[i] = proposed

error_func = X2

X2_current=error_func(data,generator(current),error_sigma)
#compute error of current generated data vs real data

X2_proposed=error_func(
data,generator(proposed),error_sigma)
#compute error of proposed generated data vs real data

A = likelihood_ratio(X2_current,X2_proposed)
# compute ratio, i.e accept ?

# if ratio > 1 : accept always. if ratio < 1,          # accept if ratio > random number 0..1
# That is: if P(target) > P(current) : always accept.
# Else accept if random p is less than ratio. The smaller
# the ratio, the less chance of accept.

p = pm.runiform(0,1)

if p < A :
walk[i] = proposed # accept
accepted += 1
else:
walk[i] = current
rejected += 1

```

That’s pretty much it.

Running the program generates a random walk of 50000 steps, each step representing a value for the param we are after, that is, the walk constitutes a distribution of plausible values for the unknown we are interested in.

Let’s first look at the random walk: The above plot show the entire random walk, with orange dots representing proposed values, and blue dots representing accepted values (the blue ones are hard to see, because they are overwritten by the very many proposed blue dots, but they are there.

As can be seen, the walk starts from the initial dummy value (10), and rapidly converges towards the region of the true value of the parameter, that is, around 100, where it stabilizes around 80-120.

To see the resulting distribution of parameter values, we’ll look at a histogram: The orange bars represents the posterior probability of our parameter, and we can see that it’s centered around 100, which happens to be the true value, used when we simulated our data.

The red plot is the theoretically obtained PMF for the Poisson Distribution with EV 100, and we can see that the shapes are very similar.

So, what have we accomplished here ? Basically, a demonstration of how one might implement data fitting for  Bayesian MCMC. As noted above, this is just a toy hack, and I’m totally convinced that for a real, “Industrial Strength Quality” MCMC Sampler’s, a lot more code has to be written for good performance and quality, but my purpose here was just trying to figure out the basic logic of data fitting and sampling.

I might or might not return to this topic in the future.

As an endnote: If I’d been living during the 40ies in Los Alamos, and known this stuff, and perhaps most importantly, could have brought my contemporary laptop with me… I’d most likely been working with some really impressive guys…! 😉 