“All Models are wrong. However, some Models might be useful.”
In these Covid-times, traditional media, as well as various social media warriors, like to present “Excess Deaths” 2020, to either boost or downplay the severity of Covid-19.
However, the question one should always ask when seeing these articles/posts on “Excess Deaths” is:
Excess to *what*, exactly….?
If you want to draw any meaningful conclusions about the impact of Covid on deaths, you simply can’t take whatever number some analysis presents as “Excess Deaths” as gospel, at least not before you truly understand how that “Excess” is defined, and what the baseline is, how that baseline has been established.
That is, in order to determine “Excess Deaths”, you need a “model” for the baseline. And as we will see below, that model is very important – different models (and model parameters) will give *very* different results.
What follows is by no means an exhaustive list on how the notion of “Excess Deaths” can fool you, but might serve as an intro to the topic.
So, the basic problem with the notion of “Excess Deaths” is that excess can only be defined once we have decided what the Baseline is. That is, what exactly are we comparing the actual data to…? What’s the “normal outcome”….?
Often “average” of something is used as a baseline, e.g. that’s what SCB does when defining the baseline as the avg number of absolute deaths 2015-2019, thereby defining “Excess Deaths” as the difference between absolute deaths 2020 and average absolute deaths 2015-2019. This definition is already problematic on its own, because it fails to take demographics into account, but there’s also a more general problem with averages: averaging data means that you loose track of any and all variability within the data; averaging data results in all uncertainty suddenly disappearing, which leads to unjustified false certainty. So, even that seemingly simple way of defining a baseline as an average has significant fundamental problems.
Whichever way we define our baseline, different baselines will give different amounts of “Excess Deaths”.
For the Swedish prel. death stat’s from SCB.se, they use the absolute difference in deaths 2020 vs average of 2015-2019 to define “Excess”. Now, this is clearly misleading, due to differences in demographics between the years: in terms of population size, and even more importantly, in terms of age structure – some years different age cohorts are very different.
So not even the Government Bureau of Statistics do a very good job of presenting data on “Excess Deaths” – while technically correct, their way of presenting “Excess” in absolute numbers is semantically misleading.
Below a couple of other, potentially misleading ways to present “Excess Deaths”:
As the first example, let’s use a variant of curve fitting to define our baseline, more specifically a smoothing Hanning window function, and fit a baseline curve to the Swedish mortality data for the period of 1949 to 2020 (2020 data up until Dec-22’d):
In the graph above, the blue line shows the actual mortality data, and the orange line shows the fitted baseline curve, that is, the “Expected Deaths”. As we can see, 2020 indeed has excess mortality (the blue line resides higher than the orange line for 2020). Let’s convert that excess mortality to absolute number of extra deaths over and above this baseline, “Excess deaths”:
So here we can see that 2020 has – according tho this particular model! – some 1500 “Excess Deaths”. However, we can also see that 2019 – still according to this particular model – had some 2500 “Deficit Deaths”.
But perhaps our intention is to downplay the severity of Covid…? In that case, we can make things look much better by a slight modification to our model: let’s change the parameter of the Hanning function window size from 5 to 20, that is, we let the function smooth the curve over 20 years instead of previous 5:
With this tiny change of model, we now have made 2020 “Excess deaths” look very… insignificant. How insignificant…?
By changing the window size we now have about 45 (!) “Excess Deaths” for 2020,instead of the 1500 from earlier…!
You are hopefully starting to too see my point…?
Now, let’s see if we instead of downplaying the severity of Covid, would want to boost it:
Next, let’s use a different model for the baseline, this time a first order linear regression, and let’s take even longer historical mortality data into account, and see what that model suggests regarding the Swedish “Excess Deaths”:
Above we have actual Swedish mortality 1861-2020 YTD in blue, and a Bayesian Linear Regression, with the expected value, and the uncertainty on that expectation (the darker shade around the expectation) , as well as a posterior prediction (the lighter, wider shade).
Well, the outcome of this model is really quite different – according to this model, Sweden has had significant number of “Excess deaths” all the way from 1975 to 2020….!
Let’s quantify that excess by multiplying the difference between actual and expected mortality with population:
Whoa….! According to this model, absolute excess deaths have been around 10000 per year (!) the past few decades….! Remarkable that no one seem to have noticed…. ! Neither has there been any fuzz about the exceptionally low number of “Excess Deaths” between 1919 – 1970…..
So, already now we should start to appreciate that the notion of “Excess Deaths” is very much dependent on what model we choose to represent the baseline: the curve fitting model from earlier vs. the Linear Regression model above produce wildly different results.
Let’s continue with the Linear Regression model, and see whether things change if we choose a different starting year, let’s start 1968 (the year of Woodstock):
Well, well, well…: now, when the only thing we’ve changed is the year we start our analysis, it all of sudden looks like we’ve had a significant “Death Deficit” over the past decade….! So now we are back in the “Let’s downplay the Severity”-Land again…
Let’s quantify that in terms of “Excess Deaths”:
So, according to *this* model, it now looks like 2020 had a “Death Deficit” of almost 5000 deaths…! And 2019 had almost 10000 “missing deaths”…
Let’s do one more “zoom-in” using the same model – let’s look at the period 2000-2020 YTD:
It might be interesting to note that 2019 as well as 2020 might by the outcome of this model be considered as “outliers” – both reside (2020 almost, for now, most likely fully outside once full year 2020 is in) outside the credible interval (the lighter shaded area) for posterior predictions.
Just for the fun of it, let’s do ONE MORE zoom-in, looking at the period 2015-2020 YTD:
Now, when only looking at this period of 6 years, both 2019 and 2020 appear to be… nothing very exceptional… Note though that with only 6 data points, the uncertainty in the model’s result has much more uncertainty, which can be seen by the width of the shaded areas.
Let’s look at our (in)famous “Excess Deaths”, based on this model:
So, according to *this* model instance, 2020 YTD has a moderate number of “Excess Deaths”, about 2000, and the real “outlier” was 2019, with a “Death Deficit” of 3000.
Now, we could continue with other types of linear regression models, e.g. change our Linear Regression model from first order to polynomial (power >= 2) regression, or even use splines to thereby get a very smooth and flexible baseline curve, but I’ll leave that as an exercise for the reader. Instead:
Let’s try something else: a Linear Regression over the Y2Y mortality growth factors – surely *that* model will tell us the truth…?
Let’s first look at the full dataset, starting 1861:
Overall, the Y2Y mortality growth trend looks pretty flat, however, there are some severe outlier’s in the data, e.g. 1918 had VERY high mortality growth, immediately followed by exceptionally low growth 1918->1919 (which is kind of obvious, when you think about it…). And on that theme: look at 2019: mortality dropped quite a bit from 2018, and guess what happened 2019->2020…? Mortality grew again…! Regression to the mean, perhaps…?
Anyways, from this model we can conclude that mortality 2019 was about 3 percent points below expectation, and 2020 about 5% above expectation.
So, what then should we conclude from the above examples…?
Now, first of all, all the above models are “Toy models”, quick hacks I just made up for the purposes of this blog post, that is, they are not intended to present the “Truth” about Swedish “Excess Deaths”.
My point here, using the above models as illustration, is instead to demonstrate that the notion of “Excess Deaths” is fraught with difficulties in interpretation, and an absolute minimum for presenting numbers on “Excess Deaths” is for the presenter to clearly show how the baseline that excess is measured against is defined.
For my own analyses, I’ve ceased to use “Excess Deaths” simply because there are so many definitions of “Excess” flying around.
Secondly, as I’ve stated many times in other posts, for any serious look into deaths and mortality, you must take the population age structure into account : e.g. most western countries have aging populations, and it should go without saying that a population or year with many old people have higher number of deaths than a population or year with few old people.
Finally, for a serious look at “true” mortality differences, at least over longer periods of time, population Life Expectancy – that tends to grow by 0.1-0.2 years per year – should also ideally be taken into account. However, it is non-trivial (at least for me) to figure out how to incorporate that factor in any deterministic way into a model – sure, I can run simulations that would show the impact of increasing Life Expectancy, but thus far I’ve not been able to figure out how to incorporate that factor into an analytical model – anyone out there with a nice canned example on how to incorporate LE into a analytical, age group stratified mortality model, let me know in the comments.
Meanwhile, those that are interested in what Life Expectancy is, and why it’s a tricky beast to include in any analytical model , the below links provide lots of good info:
So, hopefully this illustrates that care must be taken whenever “Excess Deaths” are presented – without knowledge of how the baseline is defined, it’s meaningless to place any significance to the numbers presented. The above models provide very different results for 2020 “Excess deaths”, ranging from 10000 above expectation to 5000 below, and pretty much everything between.
IMO, a much better way to illustrate differences in deaths and mortality between years (or countries) is to use proper age adjusted mortality, where a standard population is the basis on which the various age cohort mortality rates are applied.