This final part on Bayesian A/B-testing will continue looking at the various assumptions, implicit or explicit, that always are in play when building statistical models. In part II, we looked at what impact larger data sets have on our inferences, in this third and final part, we will take a brief look into what effect our prior and its parameters have on our posterior.
Continuing the same example from the previous parts, let’s first look at our prior, the Beta-distribution that we have centered around 20%, that is, our prior experience/knowledge/belief from past campaigns has given us information that the typical signup-rate is 20%. We can achieve a beta distribution centering on 0.20 in many ways, selecting suitable values for its parameters alpha and beta, so let’s start by selecting these parameters such that our prior is reletively “weak”, i.e. there is more of uncertainty in our prior information.
Above is our ‘weak’ prior. As can be seen, the prior peaks around 0.20, but has a long and fat tail. This prior reflects quite a bit of uncertainty in our prior, and that’s fine, if we indeed are uncertain about the prior, for whatever reason – perhaps we have based our prior on a very small set of data, or there might be much variability in those data, or whatever it was that made us fairly uncertain about the “true” accuracy and precision of our prior data.
Running the Bayesian inference engine with the above prior, produces the below posterior:
In this case, the results indicate that strategy A is a little bit better than B, by 2 percentage points in terms of signup-rate, A indicating a signup-rate of 31% while B shows up as 29%. Perhaps not a big enough difference to use as a sole basis for a strategy decision.
Let’s now imagine that we are a bit more certain about our prior info, and thus want to make our prior more narrow. We can do that by changing the alpha and beta parameters of the Beta distribution:
Now, the prior is still centered around 0.20, as we want, but it is also much narrower, i.e. the long fat tail is gone, meaning that in this case, variability is less, reflecting our greater certainty about the prior information.
Running the model with this prior results in the below graph:
Now, the results indicate that in terms of signup-rates, the two strategies are more or less identical, both centering on 22%. That is, incorporation of a “stronger” prior, with less variability, have now pulled our expectation more towards the prior, from what the data themselves indicate. In this case, by narrowing the prior, we are essentially saying to the model that “we are fairly certain about what the outcome will be, even before you have seen the data, so please adjust your results to reflect that fact!”.
To summarize: Bayesian inference is a very powerful tool, not least for A/B-testing, allowing models to be fit even to a very limited data set. Obviously, the more data you have, the merrier and better, but Bayesian inference does not suffer from “magical numbers” like e.g. the Frequentist maxim of “minimal sample size must exceed 30”, or more or less equally magical p-values. Furthermore, with Bayes you get distributions, not point estimates, which means that you’ll have more information to make an informed decision – after all, that’s what our objective is – to decide which strategy among competing strategies to choose, and we want to make an as informed, evidencebased decision as possible.
To use Bayesian inference on real world, not toy problems, because of the combinatorial explosion within the “Garden of bifurcating paths”, you will need algorithms based not on the Approximate Bayesian Computation (ABC), but based on MCMC. Luckily, there are modules available for MCMC in Python, such as Pymc3 and Pystan, of which I’ve been using Pystan.