Look at that headline – scary, isn’t it ? 42% increase in cases from previous week…!
Let’s have a look:
Look at the top plot depicting “cases” : notice how the latest two weeks both are above 1, i.e. the weekly case count is higher than the preceding week: 23% and 42%, respectively. Surely a clear sign that the dreaded DELTA now is here….?
Perhaps not… again, it’s important to distinguish between “Relative Shark & Absolute Penguin”, that is, looking at both relative as well as absolute numbers:
Below the weekly counts of cases, ICU’s and Deaths:
I don’t know about you, but to me, there’s a world of difference in conclusions between the “42% increase!” type of headline I’m sure we are now going to see in media, and the conclusions to draw from the bottom graph.
“Many of us have experienced a really bad boss. It is only with us humans that a bad leader can become a boss at all. We humans can deceive each other. In a pack of dogs, the leader is the one best suited for leadership. A dog that cannot lead does not become a leader. It is not possible in a dog’s world. The reason is simple and enviable; Dogs can not lie, manipulate and exaggerate or withhold things that may be appropriate to lie low with. A successful leader deserves the trust of the group. It can take time, just like dogs work. They prove themselves in a group, in their flock.”
As stated in the earlier post, there are several caveats with that analysis, the most fundamental two are:
Since I don’t know Hebrew, I’ve not been able to verify the data and its source (although the Twitter post does reference the source (health.gov.il). So, in what follows, I’m assuming that the numbers as given in the Twitter post are accurate.
The given data is only for the period of June 27:th to July 3:d, meaning that we have very little data to work with, covering only a very short period. Thus, even when assuming the given numbers are accurate, there’s a lot of uncertainty in any estimates based on that small amount of data, and the short period. How much uncertainty…? It is exactly in this type of situation – a limited amount of data – that Bayesian Inference, providing not only point estimates (combined with p-values that I’ve never really understood), but full probability distributions, can help us get a handle on what that uncertainty looks like, how much uncertainty there is.
The model I’m using in this analysis is a bit more complex than the previous one : the current model is hierarchical, using partial pooling of vax status & age group based data, the idea being that regardless of age, there is _some_ commonality between the age groups in terms of vaccine efficacy. The result of this is something called “shrinkage“, where the individual age group based estimates are “pulled” a bit towards the overall mean parameter value.
Warm-Up by a quick look first at the Pfizer Trial Vaccine Efficacy Data
(For those of you not very interested in the technicalities of Bayesian analysis, feel free to skip this section and jump straight to the section on Israeli data below)
First, let’s apply a very simple Bayesian model (*fully* pooled, like standard A/B-testing) to the Pfizer trial data, from this Lancet article, the one where I believe the 95% Efficacy numbers originate from:
My simplistic model reports an Efficacy rate of 93%, with a 89% credible interval ranging from 0.90 to 0.96. That is, very close to the numbers reported in the official trial data.
We can also run a partially pooled model on the Pfizer data for comparison. Basically, the difference is that in the partially pooled model below, the two incidence rates, test group vs control group, are not *fully* independent, instead they experience “shrinkage“, that is, a bit of ‘pull’ from each other, the less data, the more pull. Both incidence rates are therefore pulled down a bit, the group with less data pulled proportionally more.
The canonical analogy (credit to Richard McElreath) for explaining how partially pooled models work is as follows:
Imagine you are a Martian on your first visit to Earth, and you are interested in waiting times at cafe’s. Before your first visit, you have absolutely no idea on how long you’d typically have to wait for your cuppa, so your expectation on the waiting time covers a huge range. Let’s say you visit your first café in Stockholm, and you have to wait 5 min. Now, you have a bit more information on what the average waiting time might be. So you update your prior. Next, you visit a few more cafés in Stockholm, and by Bayesian updating, arrive at an expected waiting time of 3 min. Next, you travel to Oslo, and study café waiting times there. The question is: should you now reset your expectation to your initial, wide one (since you have no a priori info on the waiting times in cafés in Oslo (your experience is exclusively from Stockholm) ? That is, should you forget all about what you learned about cafe waiting times in Stockholm, now that you are in Oslo ? That is, are you subject to retrograde amnesia all of sudden, because you moved from Stockholm to Oslo ?
If you believe that the waiting times in Stockholm vs Oslo are totally independent, i.e. there’s no value in knowing the avg. waiting time in Stockholm when you are in Oslo, then, you should pool your model fully, that is, treat Oslo cafés as a totally different species than Stockholm cafés. On the other hand, if you believe that waiting times in Starbuck’s Stockholm and Starbuck’s Oslo probably are not that dissimilar, then you want to model the waiting time as a partially pooled one, that is, using some of what you learned in Stockholm when studying Oslo, that is, the data from Stockholm has some influence on your Oslo prediction.
So, below the Pfizer data modeled with a partial pooling model:
Using the partial pooling model, the efficacy is a smitheren higher, 94% instead of 93.
We can see the shrinkage clearly by looking at the descriptive stats for the two models (fully pooled vs. partially pooled):
The above table shows the two models, “Full Pool” vs “Partial Pool”, and three desc.stats for each: mean, HDI low, HDI high (for a 89% Credible interval). We have two parameters of interest: alpha, which is the incidence rate for the control group (non-vaccinated), and alpha which is the incidence rate for the test group (vaccinated). These params are on a log-odds scale.
The two p_alpha parameters are derived from the log-odds alphas, and show the incidence on a normal 0-1 probability scale for the two cohorts. Efficacy is on normal probability scale, and the parameter of our main interest. Finally, alpha_bar is the log-odds hyperparam for alpha. alpha_bar makes the model hierarchical, partially pooled and enables shrinkage, and p_alpha_bar is its representation on a normal probability scale.
If we focus on the two p_alphas, we can see that both of them gets a tiny bit smaller when going from fully pooled to partially pooled model. That is, both of them is pulled towards a common mean (p_alpha_bar). This is shrinkage in action, enabled by the hyperparam alpha_bar, a common “ancestor” to both alphas.
Finally, for this rather technical section, we can take a look at a forest plot showing the difference between the fully pooled (red) and partially-pooled (orange) models :
For a good intro to hierachical models, pooling and shrinkage, take a look at this blog post.
Now, let’s look at the Israeli data:
For reference, here’s a dataframe with the data from that Twitter post, combined with point estimates on incidence and efficacy (assuming a population size for each age group) :
Let’s next run the new, hierarchical model on this data, and first look at the distributions forincidence rates, non-vaxed (green) vs vaxed (red):
A couple of observations to make:
notice that the credible intervals (~~confidence intervals for the Frequentists among us 🙂 are much wider for the non-vaccinated vs the vaccinated. Why…? Because there’s much more data on the vaccinated, since most Israelis are now vaccinated, thus the model is more certain on the vaccinated.
notice that for the cohort with the least amount of data, the non-vaxed 80-89 year olds, the incidence rate (as given by the round marker) has been pulled up from it’s calculated value 0.03 from the dataframe above, to 0.07. Shrinkage in action!
Next, lets look at the Vaccine Efficacy distributions, per age group:
Again, a couple of things to notice:
Vaccine Efficacy calculated on this (very limited!) data is far from the 95% of the pre-release trials.
The uncertainty is huge: the credible interval for all age groups except 80-89 spans over zero
Age group 80-89 has been pulled upwards by shrinkage quite a lot. With the very low calculated incidence rates for this group, combined with shrinkage, that is, the non-vax incidence rate being pulled up from 0.03 to 0.07 (which is still way lower than the rates for the rest of the npn-vax groups), Efficacy for this group ends up at about 50%, however, with a wide credible interval.
We can also look at a forest plot of the same data:
So, what conclusions regarding Vaccine Efficacy would I draw from this analysis…?
The short answer is: almost none. First, because I don’t have a clue regarding the quality of the data. Secondly, because there’s very little data. And thirdly, because the analysis makes it clear that there’s a lot of uncertainty in the results. But that last point is actually a valuable finding : I’ve seen a lot of Twitter posts referencing this dataset and drawing full blown conclusions from it, but with a bit of Bayesian analysis, we can now appreciate that before jumping to any conclusions, we need more data, and we need to confirm data quality.
Do you know the vaccine efficacy numbers cited for COVID vaccines, the ones coming from the pre-release trials…? I’ve seen numbers in the 65-95% ball park, as e.g. in this Lancet article which is IMO better understood by reading this blog post.
A few days ago I noticed the below data from Israel, on Covid cases among vaccinated vs non-vaccinated:
Simply by eye-balling the top table, we see that most COVID cases now occur among the vaccinated. By a huge margin. For each age group.
That doesn’t look good…! But there are some caveats: first, look at the column “Percent of Population Vaccinated” : the vast majority of people in Israel are now vaccinated, so unless we expect the vaccines to be 100 % efficient (which I in fact myself believed vaccines being, before COVID vaccines came into picture…) in blocking the virus, chances are that most of the infected will indeed come from the larger cohort (those vaccinated). Secondly, the table does not provide the group sizes, so we cant calculate the incidence rates for the vaccinated vs non-vaccinated. Thirdly, the absolute number of cases (relative to the unknown group sizes) is very small, so the uncertainty with this few data points on cases, particularly for the non-vaccinated, is huge.
But we can still do a back-of-the-envelope type of calculation on vaccine efficacy – as long as we keep in mind the uncertainty coming from the small numbers – by assuming the group population size, and I’m going to be lazy and set it to 1M per age group.
With a bit of arithmetic, we get to:
Efficacy is given by the rightmost column. It’s way below the 65-95% range given e.g. by the Lancet article mentioned above. However, let’s not forget that this was calculated with very little data, so we should be very careful drawing any conclusions from this data, until we have more data on cases.
One way to see how certain / uncertain these numbers are can be obtained by running a Bayesian analysis, to obtain not only point estimates (as above), but full probability distributions for the efficacy rates for the various age groups.
I did a quick & dirty version of such an analysis for 7 of the 8 age groups above (the 90+ group gets an infinite negative efficacy since there are 0 cases in that group within the non-vaccinated):
One way to understand the uncertainty is to look at the 89% Credible Interval, given by the black horizontal bar: it crosses zero for all age groups, meaning that there’s some probability (density) on both sides of vaccine efficacy being positive or negative. And looking at the last graph, the one for 80-89 year olds, where we have only 23 cases for vaccinated, and 2 for non-vaccinated, the credible interval is almost perfectly balanced around zero, meaning that we should probably not put much trust in the efficacy number for that age group.
Nevertheless, it seems that vaccine efficacy in Israel now, when most Israelis are vaccinated, does not reach even close to the 65-95% range given by the pre-release testing. Far from it.
... is getting people to care ! Educate-Inspire-Change
Definition: An amateur who's been given the power to set the goals & objectives for professionals to follow, and thinks of himself as a strong leader.
"Veritas vos liberabit"
"It's better to build something 100 people love, than something that 1M people kind-of-like" -- Brian Chesky
"Rättvisa handlar inte om lika utfall, utan om lika spelregler" -- Alice Teodorescu
“The mediocre teacher tells. The good teacher explains. The superior teacher demonstrates. The great teacher inspires.” ― William Arthur Ward
When the facts change, I change my mind. What do you do, sir?" -- John Maynard Keynes
""Don't attribute to stupidity what can be adequately explained by a contract." -- Anonymous
"Without deviations from the norm, progress is not possible" -- Frank Zappa
"I must study politics and war that my sons may have liberty to study mathematics and philosophy and they in turn must study those subjects so that their children can study painting, poetry, music, architecture, statuary, tapestry, and porcelain." -- John Adams, Americas II president
"In a time of universal deceit - telling the truth is a revolutionary act." -- George Orwell
"The only thing necessary for the triumph of evil is for good men to do nothing." -- Edmund Burke
"Being normal is being merely average" -- Shawn Achor
“It is difficult to get a man to understand something, when his salary depends on his not understanding it.” -- Upton Sinclair
"I do not agree with what you have to say, but I'll defend to the death your right to say it." -- Voltaire
"Our organization was designed for a problem that no longer existed; we had brought an industrial age force to an information-age conflict." --Stan McChrystal