Bayesian Multi-predictor Regression – Valet2018

[Continuing my exploration of the Swedish election results, but I thought this might be of interest also for those of you not very interested in the Swedish elections, simply because the potential MatStat’s  insights – thus, the text is in English…]

The data presented here is the official preliminary election result data from Valmyndigheten, combined with data from SCB on general population characteristics, and contains the data of all the 290 Swedish municipalities (“kommun”)]

So, let’s start by plotting a couple of potentially interesting parameters from the results of the recent elections. In this post, to get us started , I’ll focus just on the results for a single political party, “Moderaterna”, but I have the results for all the other parties, and might publish them at some later point.

[Heads-up: you are about to see a couple of very busy graphs, but stay with me, because these busy graphs actually reveal quite a bit of interesting info…]

Plot over share of votes for M, from 3 different perspectives: median income, education level, ratio of foreign born inhabitants:

M_data_point_plot

So, what do we actually see here…? First, there are 3 different plots on the graph, each dot represents a specific municipality:

  1. Ratio of votes for M  (y-axis) plotted over ratio inhabitants with high level of education (x-axis) – red dots and regression lines.
  2. Ratio of votes for M plotted over median income – green dots and regression lines
  3. Ratio of votes for M plotted over ratio of foreign born inhabitants.

Secondly, the axis are not in absolute values, but scaled to centralize the values on both axis. That means that for either axis, the 0 point represents the mean (average) value for the parameter, thus any point sitting at (0,0) has a mean value in both dimensions presented.

If we first focus on the green dots, representing share of votes over median income within the municipalities, we can see that the dots are fairly well clustered around 0  in the x-dimension, revealing that there are not major differences in median income levels between different Swedish municipalities.  If you compare the clustering in x-dimension of the green vs the red (ratio of inhabitants with high education level) or the magenta (ratio of foreign born inhabitants) you see that the green dots are clustered about the range -0.25 to 0.30 on the x-scale, meaning that the income varies in the range of -25% to +30% from the average.

From the three corresponding regression lines, we can see that all three parameters have a positive slope, the first two significantly so, meaning that an increase in x-value should result in an increase in the y-value, that is: the higher income, the more votes for M; the higher ratio of folks with high education, the more votes for M. From the slopes we can suspect that the economic (income) factor is a key determinant for whether folks vote for M, or not. But…: perhaps some of these params are inter-related…?

Let’s run a multi-predictor regression to find out:

Multi-predictor regression:

val2018_multi_reg_M

Here, we are still dealing with the same party, the same data, but now we have combined the 3 parameters (education,income,foreign born) to a single, multipredictor regression, represented by the orange area, with the black dashed line representing the mean regression line. A couple of things to note: here, I’ve run a Bayesian regression, while the regression lines in the previous graph were non-Bayesian, just std. Linear Least Squares. Since Bayesian methods deal with probability distributions, while more traditional (“Frequentist”) methods deal with point estimates, we can here explicitly show the level of uncertainty of the data and analysis – the orange area is in fact a whole bunch of regression lines, clustered more or less on top of each other, thereby illustrating the area of uncertainty.  Furthermore, the “baby blue” area below is the 89th percentile CI (“Credible Interval”), further illustrating the level of uncertainty within the result.

What we see here is that the income parameter is in fact the dominant force of the regression, another way to state that is that of the three parameters measured, income is the most important one for determining whether people vote for M or not.

There’s a whole bunch of other, more technically oriented info in the graph, but let’s just stop here for now, and contemplate the major finding: economy is the prime factor determining whether to vote for M or not… 🙂

About swdevperestroika

High tech industry veteran, avid hacker reluctantly transformed to mgmt consultant.
This entry was posted in Bayes, Big Data, Data Analytics, Data Driven Management, Numpy, Politik, Probability, PYMC, Python, Research, Society, Statistics, Sverige and tagged , , , , , , , , , , . Bookmark the permalink.

1 Response to Bayesian Multi-predictor Regression – Valet2018

  1. Joe Marasco says:

    Thanks for the English.

    Joe

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s