Nedan trendlinjer för samtliga 8 nuvarande riksdagspartier, från valet 2014 till januari 2018.

Först en odämpad, “noisy” presentation:

Den här typen av odämpad trendlinje blir snabbt oläslig, så om man misstänker – som jag gör – att det underliggande datat är “noisy”, så är det rimligt att dämpa trendlinjen något med en lämplig smoothing-function, i mitt fall en Hanning-filter-baserad windowing-function. Då ser trenddatat ut så här:

Här syns den övergripande trenden bättre, bla. kan vi se att KD sannolikt åker ur riksdagen – under förutsättning att “Kamrat-4%-fenomenet” uteblir. Även MP och L balanserar på slaka linan.

Om vi istället tittar på blocken, Rödgrönt vs. Alliansen, så ser en dämpad trend ut så här:

Vi har tre tydliga block i svensk politik numera, och inte minst med tanke på att en eller flera blockpartier riskerar att åka ur riksdagen i kommande valet, så lär regeringsbildningen efter valet i september bli riktigt utmanande.

This final part on Bayesian A/B-testing will continue looking at the various assumptions, implicit or explicit, that always are in play when building statistical models. In part II, we looked at what impact larger data sets have on our inferences, in this third and final part, we will take a brief look into what effect our prior and its parameters have on our posterior.

Continuing the same example from the previous parts, let’s first look at our prior, the Beta-distribution that we have centered around 20%, that is, our prior experience/knowledge/belief from past campaigns has given us information that the typical signup-rate is 20%. We can achieve a beta distribution centering on 0.20 in many ways, selecting suitable values for its parameters alpha and beta, so let’s start by selecting these parameters such that our prior is reletively “weak”, i.e. there is more of uncertainty in our prior information.

Above is our ‘weak’ prior. As can be seen, the prior peaks around 0.20, but has a long and fat tail. This prior reflects quite a bit of uncertainty in our prior, and that’s fine, if we indeed are uncertain about the prior, for whatever reason – perhaps we have based our prior on a very small set of data, or there might be much variability in those data, or whatever it was that made us fairly uncertain about the “true” accuracy and precision of our prior data.

Running the Bayesian inference engine with the above prior, produces the below posterior:

In this case, the results indicate that strategy A is a little bit better than B, by 2 percentage points in terms of signup-rate, A indicating a signup-rate of 31% while B shows up as 29%. Perhaps not a big enough difference to use as a sole basis for a strategy decision.

Let’s now imagine that we are a bit more certain about our prior info, and thus want to make our prior more narrow. We can do that by changing the alpha and beta parameters of the Beta distribution:

Now, the prior is still centered around 0.20, as we want, but it is also much narrower, i.e. the long fat tail is gone, meaning that in this case, variability is less, reflecting our greater certainty about the prior information.

Running the model with this prior results in the below graph:

Now, the results indicate that in terms of signup-rates, the two strategies are more or less identical, both centering on 22%. That is, incorporation of a “stronger” prior, with less variability, have now pulled our expectation more towards the prior, from what the data themselves indicate. In this case, by narrowing the prior, we are essentially saying to the model that “we are fairly certain about what the outcome will be, even before you have seen the data, so please adjust your results to reflect that fact!”.

To summarize: Bayesian inference is a very powerful tool, not least for A/B-testing, allowing models to be fit even to a very limited data set. Obviously, the more data you have, the merrier and better, but Bayesian inference does not suffer from “magical numbers” like e.g. the Frequentist maxim of “minimal sample size must exceed 30”, or more or less equally magical p-values. Furthermore, with Bayes you get distributions, not point estimates, which means that you’ll have more information to make an informed decision – after all, that’s what our objective is – to decide which strategy among competing strategies to choose, and we want to make an as informed, evidencebased decision as possible.

To use Bayesian inference on real world, not toy problems, because of the combinatorial explosion within the “Garden of bifurcating paths”, you will need algorithms based not on the Approximate Bayesian Computation (ABC), but based on MCMC. Luckily, there are modules available for MCMC in Python, such as Pymc3 and Pystan, of which I’ve been using Pystan.

Continuing my example by examining how the different assumptions – yes, in any model there are always assumptions, explicit or implicit – of the model impact the end result, that is, the prediction of the sought after signup-rate, a.k.a our posterior probability distribution.

We have several assumptions within our model: first of all, I’ve chosen the Beta-distribution as the prior distribution, and already that choice in itself is an assumption. For the generator function, I’ve chosen the binomial distribution, yet another assumption. But for this problem, both of these assumptions make sense, so let’s stick with them.

Further assumptions arise from the parameters for the functions: for the Prior Beta, we need to specify the alpha and beta, which defines the “shape” of the distribution. In our example, we decided that the prior should be centered around 20%, so let’s start by setting alpha to 2, and beta to 8, which will result in the beta distribution centering at 0.2.

Let’s also start with values for number of trials (n) and number of successes(k) as in the previous post, that is, the data reported by our preliminary market research found that for strategy A, 6 out of 16 respondents signed up, while for strategy B, 8 out of 25 respondents signed up.

Let’s first look at the prior distribution, what it looks like with alpha = 2, and beta = 8:

This prior, which represents our existing knowledge or belief on the likelihood for people signing up on our various offers, is centered around 0.2, that is, with a mean signup-rate of 20%, but as can be seen from the graph, the distribution is pretty wide, with most likely probabilities ranging from ~15% – ~30%. That is, there’s a fair amount of uncertainty in this prior. That in itself is neither good or bad, however, if indeed there is much uncertainty in our prior, the we would want the prior distribution to reflect that fact, by being wide. On the other hand, if we have a prior we know for certain being very narrow, then the above prior is not reflecting that belief/knowledge.

But let’s stick with this fairly wide prior for now, and see what it generates:

Our first run of my inference engine was run with the prior as discussed above, and the data input to the binomial generator was (16,6) for strategy A, and (25,8) for strategy B. After chewing a very short time on these inputs, my Python/Pystan/Stan program came up with the above posterior belief on the signup-rate:

For strategy A, the mean probability for sign-ups was found to be about 30%, while for strategy B, the signup probability was about 28%. That’s a narrow difference of only 2%-points, and probably not differentiated enough for us to make a decision on, particularily considering that the amount of data is quite limited, 16 and 25 data points, respectively for each strategy.

So, after analyzing this result, we decide that we need more data. Thus, we conduct another market poll, and this time, a larger one, where for strategy A, we have 160 responses, with 60 of them signing up, and for strategy B, we get 250 responses, of them 80 signing up. Let’s put these new numbers into the program, while keeping the existing prior:

Now, with 10 times as much data to fit our model to, the two alternatives start to move apart from each other: now Strategy A gets almost 37% signups, while strategy B gets about 32%, a difference of about 5 percentage points.

Does this poll size give us enough confidence to make a decision going for strategy A ? Perhaps. First of all, it depends on whether a 5% difference makes significant difference to your business – that’s up to you to decide. Secondly, how can we be sure that we now have enough data…? Well, one way to find out an answer for the second question is to run the program again, with even larger n and s, and see what happens. So let’s multipy the n and s of trials and successes once again by factor 10 and see what happens:

Now, for strategy A, we have 1600 trials, out of which 600 sign up, and for B, we have 2500 trials, with 800 signups. As can be seen from the graph, by increasing our data by factor 10, we get almost the same result as before. Thus, perhaps we are now justified to determine that the results from our previous poll, with 160 vs 250 trials give us good enough results to base our decision upon.

Of course, there’s always a risk that should we indeed decide to expand our preliminary market research by doing a factor 10 times larger poll yet, we might get different answers – perhaps our data collection has been biased, and in that case a larger data set could smooth out the outliers better than a smaller data set. On the other hand, time (and resources spent) are money, and if we are looking for the “perfect” answer, we will never cross the finishing line, after all, “Prediction is Difficult, Particularily about the Future”, so we need to make an as informed decision, with the data we can collect with reasonable time and effort. And with Bayesian methods, we get more than just the point estimates of traditional, frequentist methods, so having found that there’s not really much point of conducting yet another, larger and more expensive market research activity, at least I myself would be willing to commit to a decision based on the findings of this analysis.

Inspired by Rasmus Bååth’s lectures on Bayesian Inference, I’ve implemented a simple Python example demostrating how Bayesian Inference can be used for A/B-testing, that is, evidence based testing. This methodology, i.e. A/B-testing, is useful in most domains, e.g. development of farmaceuticals, marketing campaigns, medical treatments, Search & Rescue strategies, legal court hearings, sales strategies or any other domain where there is a need for evidence based predictions over which potential strategy (A or B) will produce better results.

The scene for this demo is as follows (for simplicity, I’m using one of the examples from Rasmus’ lectures as the basis for the scenario): Assume that you are responsible for taking a decision on which of two different sales strategies your company will take. You have conducted preliminary market research, taking (hopefully!) random samples from your potential customer base, using strategy A for half of the random samples, and strategy B for the rest of the samples. These preliminary results state that for strategy A, 6 out of 16 respondents signed up for your offer (whatever you are trying to sell), while for strategy B, 8 out of 25 respondents signed up (I’m deliberately keeping the number of respondents and successes very low for this simple example, for reasons that will become clear a bit further down).

So, how should we go about deciding which of these two strategies is the better, i.e. which of the strategies will provide us with most sign-up’s in the long run….? Or indeed, do these results even bear enough significance to base a decision upon them….?

Well, one way to use the data from the poll is simply to figure out the rate of success for each of the two strategies: strategy A had 6 out of 16 successes, which translates to about 38%, while strategy B had 8 successes out of 25, which is about 32%.

So, clearly, if we just look at these two numbers, strategy A looks better by 6 percentage points. But with the quite limited amount of data (16 and 25 data points), would you be willing to bet your shop on these simple point estimates…? I wouldn’t.

So instead of relying just upon simple data points, let’s use Bayesian inference to produce not only single data points, but a probability distribution:

This first graph below shows the probability distributions for our two competing strategies, A and B (the “ABC” in the title of the graph tells us that this graph is produced by Approximate Bayesian Computation, a conceptually great but computationally very heavy implementation of Bayesian Inference). From the graph we can see that indeed strategy A looks better, with a most likely range of about 25-45%, while strategy B has a range of 20-35%. So clearly, were we forced to chose between these two strategies, strategy A indeed looks better, and furthermore, now, after having done our Bayesian analysis (or more correctly: having let our computer do it) we can probably be a bit more confident about our decision, asop. to relying only upon the two point estimates above.

Now, the above analysis was done without having any previous opinion about the likelihood for people signing up on our sales campaigns, instead the above analysis was produced with something called “uninformative Prior”, that is, we didn’t have any prior data/statistics/opinion on how likely it is for people to sign up on our various marketing/sales campaigns.

So, let’s instead say that we in fact know (or believe we know 🙂 that previous similar campaigns tend to result in a 20% signup-rate. That is, in the past, with similar campaigns, 1 in 5 people addressed by our efforts have indeed signed up. How can we peruse this information in our analysis….?

In Bayesian Inference, we can instead of using an uninformed prior (as we did above) give our inference engine an informative prior, in our case here, the knowledge that in the past, 1 in 5 people have signed up.

Running the same model again, this time with an informative prior, the results look a bit different:

This time, with our knowledge about the success rate for previous campaigns, we can see that the two different strategies are no longer so very different in terms of their results: the two probability distributions pretty much overlap fully, and the mean for both is almost identical, about 22% rate of success in terms of signups. The other observation is that now, with the informative prior, i.e. our “apriori belief” based on the results of previous campaigns, the expected success rates for both strategies have been “pulled back”, from 32% and 38%, to 22%. Thus, by adding the informative prior to our model, our expectation has shrunk, quite a bit, and furthermore, we no longer see any significant difference between our two strategies.

We can also illustrate the – in this case, non-existing – difference between our two strategies graphically: As can be seen below, the distribution of difference is clearly centered around zero, indicating that our two strategies are equivalent in terms of success rate, thus neither of them is better than the other.

With this information, a reasonable decision would probably be to modify one or both of the strategies, hoping to find a significant difference, e.g. by setting a treshold, e.g 5%-points, that must be exceeded before deciding upon a specific strategy.

To summarize this part: by using Bayesian Inference, we can make informed decisions based not just on point estimates, but on probability distributions. Furthermore, most often we do indeed have some previous – prior – information that we would like to peruse in our analysis, and Bayesian methods allows to do so easily. Of course, it still takes careful analysis of the results, and good judgement to form a decision based on any stochastic process, but what I particularly like about Bayesian methods is that it clearly shows, by the resulting distributions, significantly more information than a single p-value or any other point estimate.

Implementation issues

The “ABC” in the title of the graphs above indicate that the results produced thus far were obtained with a Python implementation of “Approximate Bayesian Computation”. ABC is easy to understand, and easy to implement – the number of lines of code to produce the above results is less than 100, and there is nothing very complicated in that code.

The problem with ABC is that for almost any real world problem, it will literally take days and weeks of computation time to get the results. This is because of the combinatorial explosion of possibilities that the program must explore when the numbers of trials and observations grow, the so called “Garden of bifurcating paths”, neatly demostrated by Richard McElreath in the video below:

So, for real world problems, the ABC will most likely take way too long to execute. However, as luck has it, smart people in many different domains have figured out really clever algoritms for performing the necessary computations in a fraction of time compared to ABC, these algorithms are based on Markov Chain Monte Carlo.

Not only does a MCMC-based implementation run many orders of magnitude faster than one based on ABC, but furthermore MCMC can easily handle numbers of trials/observations also also orders of magnitude larger than is the case for ABC.

The graph below is produced using Pystan, a Python interface to Stan, a domain specific language implementing MCMC:

With Pystan we get – in this case, with fairly small numbers for trials/observations, the same results as we got with ABC. However, if I should try to increase the numbers, my ABC implementation would choke, not finding any solutions, while Stan & Pystan would happily compute the distributions sought after.

I just managed finishing Dan Brown’s latest novel, “Origin”. It’s hard for me to read Dan Brown, since IMO he writes the same book over and over again, albeit in slightly different settings. And his stereotypical main protagonists are to me less than credible…

Anyways, I used the book for falling asleep in the evenings, but eventually, reaching chapter 90, it suddenly became interesting, with pretty good presentation on Jeremy England’s research on the origin of life.

The video above is a presentation Jeremy made at Karolinska Institute, for a few years ago.

In a couple of previous posts, I’ve tried to wear out my swedish speaking audience with predictions regarding the upcoming national elections, using Bayesian Inference.

This post will be a bit more technical, perhaps of interest for a larger audience interested in Bayesian Inference, thus I’ll take it in english. First a short recap:

I started this project, inspired by a great tutorial on Bayesian inference by Rasmus Bååth, followed by lectures on the topic by Richard McElreath, author of an excellent book, “Statistical Rethinking”.

My first forecasts for the upcoming election were produced by ABC, Approximate Bayesian Computation, which is a conceptually very neat model for Bayesian inference, but suffers badly from being computationally very heavy, meaning that it literally takes ages to get decent results for any non-trivial problem.

The below eight predictions, produced with my ABC-program, took about 24 hours of CPU-time to produce, on a dataset consisting of about 40 observations for each of the 8 parties. Furthermore, to keep the computations within any reasonable limits, the ABC model forced me to keep the n-parameter of the binomial fitting model very low, 100, otherwise the combinatorial explosion resulting from larger n-values would have forced me to increase the iterations way beyond any reasonable time limits.

The eight predictions below are thus produced by ABC, with 5.000.000 iterations, and as can be seen, the results, while reasonable, are pretty wide:

That is, the wideness results from the model not running enough iterations – although the number of iterations already is 5.000.000, and the run takes 24h !

However, there are very clever people who have developed smart algorithms for doing these types of calculations, algorithms running in a fraction of the time of ABC. These smart algorithms are based on Markov Chain Monte Carlo methods, and I’m not even going to pretend that I understand all the nitty gritty detail about *how* they go about optimizing the task, but sure enough, they work like a charm:

The below set of 8 predictions are produced by Stan, a domain specific language that I interface from Python with PyStan, and now the same model that took 24h of execution with ABC, with a very limited binomial fitting model, runs in about 10 minutes, with a fitting model 100 x larger! That is, my binomial fitting model now has a parameter of 10000 instead of 100, and despite this factor 100 difference in size, the Stan based model runs about 200 times faster!

Not only did Stan cut out a huge amout of execution time, but it also cut away a lot of the variability of the results, thanks to being able to work with much larger numbers in the binomial distribution.

Interestingly, the results from the two different models are very consistent in terms of the predictions, the main difference is the radically decreased variability in the Stan based model.

Fortfarande samma data, dvs Sentios opinionsundersökningar, men dels lite tweakning av den Bayesianska inferensmodellen, och dessutom förstorad skala på grafiken.

Som synes av graferna, så ger ett 95% konfidensintervall ett brett resultat, för brett för att vara till verklig nytta. Detta beror på att min modell är implementerad mha. approximate Bayesian computation, vilket medför att modellen behöver oerhört många iterationer i simuleringen för att få fram hyfsat smala sannolikhetsspridningar.

Nedan grafer är producerade med 500.000 iterationer, som på min maskin tar ca 2h att köra. För att få smalare spridning, kommer jag att köra om simuleringen med 5.000.000 iterationer, vilket lär ta över ett dygn av CPU-tid.

Trots det breda konfidensintervallet, är det mest troliga valresultatet klart skönjbart från graferna.

När det gäller blocken så säger modellen att det Rödgröna blocket får ca 40%, medan Alliansen får ca 38%, men då sannolikheten att att KD hamnar under 4% är 70%, och att sannolikheten att Liberalerna hamnar under 4% är 40%, ser det just nu ut att det Rödgröna blocket kan komma att vinna på vad man närmast skulle kunna beskriva som “Walk-Over”.

Så, nu har mitt program tuggat sig igenom datat för samtliga 8 riksdagspartier, och för varje parti tagit fram en prognos över partiets röstandel i det kommande valet i september.

Prognoserna är gjorda med “Bayesian Inference”, dvs en statistisk metod som i korthet kan sägas bygga på att man först konstruerar en modell över det man tror om utkomsten, i termer av sannolikhet, av ett fenomen, och sedan använder sig av faktiska data – i mitt fall Sentios opinionsundersökningar från 3 år bakåt i tiden – för att successivt uppdatera sin modell.

Modellens initiala “gissning” om valutgången är till största delen baserad på resultatet av det föregående valet, och denna “Prior” (som det kallas för på “Bayesiska”) uppdateras sedan med det runt 40-talet återkommande opinionsundersökningarna.

Resultatet för de 8 riksdagspartierna nedan:

Moderaterna får enligt modellens bedömning för det mest sannolika utfallet runt 21-22%, men som synes från sannolikhetsdistributionen ovan, så sträcker sig röstandelen från ca 15-30%, vilket är ett väldigt brett spektrum. Modellen anser också att Moderaterna mest sannolikt gör ett något sämre val i höst än 2014, då röstandelen blev 23.3 %

KD kommer enligt modellen att hamna på 2-4%, och har dessutom en risk på hela 76% att landa under 4%, dvs att åka ur Riksdagen.

Liberalerna hamnar enligt modellen mellan 4-5% och har en risk på 36% att ramla ur Riksdagen.

Sverigedemokraternas mest sannolika andel sätts till 14-16%.

Socialdemokraterna backar enligt modellen jämfört med 2014, med en mest sannolik röstandel på 27-28%.

Miljöpartiet backar också jämfört med 2014, och mest sannolikt utfall är 4-5.5 %, och man löper risken att ramla ur Riksdagen med en sannolikhet på 36%.

Centerpartiet gör ett val likvärdigt med 2014, där de fick 6.33%, modellen anser att ett resultat på 5-7% är mest sannolikt.

Även Vänsterpartiet förutspås göra ett val i höst i paritet med valet 2014, där man fick 5.72%, och nu spår modellen att man får 5-6%. Risken för Vänsterpartiet att åka ur Riksdagen är 20%.

Av samtliga åtta partier, är det enbart Sverigedemokraterna som kommer att göra ett märkbart bättre val 2018 än 2014, medan Miljöpartiet är det parti som kommer att tappa mest i jämförelse med föregående val.

Jag kommer att uppdatera dessa siffror, dels när nya månadsundersökningar dyker upp, men även som en följd av försök förfina modellen allt eftersom.

Min “Bayesian Inference Engine” 0.8 rapporterar för Kristdemokraterna och Liberalerna följande:

KD ser enligt modellen ut att få mellan 2.0 – 4.0%

Sannolikheten att KD ramlar ur riksdagen är enligt modellen 76%. (Dock skall man alltid komma ihåg Kamrat 4%-effekten, dvs stödröster i sista stund för att rädda kvar ett parti).

Liberalerna får enligt modellen 4.0-5.0 %, och har en sannolikhet att ramla ur riksdagen på 36%.

(For my non-swedish readers – sorry, but this article is in swedish, and probably not much of interest to you anyway, being about the upcoming swedish elections. In case you are looking for general info on Bayesian inference, there are other posts on this site on that topic.)

Som ett litet lackmustest på dels min egen förståelse av Bayesian inference, samt som ett kul litet projekt så här ett valår, så tänkte jag göra egna prognoser över valutgången, i riksdagsvalet september 2018. Metoden jag tänkte tillämpa är att basera min analys på Sentios månadsvisa opinionsundersökningar som sträcker sig ett par tre år bakåt i tiden, och förhoppningsvis fortsätter att dyka upp varje månad fram till valet.

Hur som helst: jag kommer att använda mig av Bayesian Inference för att göra mina prognoser, och har nu skrivit en väldigt basic “Bayesian Inference Engine” i Python. Enkel framförallt för att jag inte har lyckats få igång Markov Monte Carlo på min dator, och därmed har få alternativ för att inom rimlig CPU-tid få ut några som helst svar. Men man tager vad man haver, och min Bayesian Engine är uppbyggd mha Approximate Bayesian Computation, vilket innebär att den är långsam, och lider svårt av den kombinatoriska explosion som uppstår när talen man hanterar blir något större än vad man skulle vilja kunna hantera.

Således – jag tar resultaten från mina simuleringar med en nypa salt, och råder även läsaren att göra likaledes.

Det två första experimenten jag gjort avser Miljöpartiet och Sverigedemokraterna, dvs en prognos över deras kommande valresultat, baserat på dels en “prior” som i hög grad bygger på val2014-resultatet, och “updates” baserat på Sentios månatliga opinionsundersökningar.

Nedan min första prognos för dessa två partier:

Graferna nedan visar sannolikhetsdistributionerna dels för “baseline” baserat på val2014, dels den beräknade distributionen för kommande valet.

Miljöpartiet kommer enligt min statistiska bearbetning att få mellan 4.9 – 5.5 % i val2018. Sannolikheten att MP ramlar ur riksdagen, dvs får under 4%, är 20%.

Sverigedemokraterna kommer enligt samma modell att få mellan 14.5 – 16.5 % av rösterna i val2018.

Det kan vara intressant att notera ur graferna att distributionen för MP är smalare än för SD, vilket innebär att modellen bedömer MP relativt stabilt runt 5%, medan den bredare distributionen för SD innebär att osäkerheten om resultatet är större.

Jag återkommer i senare poster med motsvarande prognoser för övriga partier.

... is getting people to care ! Educate-Inspire-Change

Empty Suit

Definition: An amateur who's been given the power to set the goals & objectives for professionals to follow, and thinks of himself as a strong leader.

Ethics

"Veritas vos liberabit"

"It's better to build something 100 people love, than something that 1M people kind-of-like" -- Brian Chesky

"Rättvisa handlar inte om lika utfall, utan om lika spelregler" -- Alice Teodorescu

“The mediocre teacher tells. The good teacher explains. The superior teacher demonstrates. The great teacher inspires.” ― William Arthur Ward

When the facts change, I change my mind. What do you do, sir?" -- John Maynard Keynes

""Don't attribute to stupidity what can be adequately explained by a contract." -- Anonymous

"Without deviations from the norm, progress is not possible" -- Frank Zappa

"I must study politics and war that my sons may have liberty to study mathematics and philosophy and they in turn must study those subjects so that their children can study painting, poetry, music, architecture, statuary, tapestry, and porcelain." -- John Adams, Americas II president

"In a time of universal deceit - telling the truth is a revolutionary act." -- George Orwell

"The only thing necessary for the triumph of evil is for good men to do nothing." -- Edmund Burke

"Being normal is being merely average" -- Shawn Achor

“It is difficult to get a man to understand something, when his salary depends on his not understanding it.” -- Upton Sinclair

"I do not agree with what you have to say, but I'll defend to the death your right to say it." -- Voltaire

Systems view

"Our organization was designed for a problem that no longer existed; we had brought an industrial age force to an information-age conflict." --Stan McChrystal