Bayes rule illustrated by a table

I just found this excellent illustration of Bayes Rule, in a book “Bayes’ Rule with Python, by James V. Stone, and thought I’d share it here, as a follow-up to earlier posts on Bayes rule, e.g. this one.

Assume we have examined 200 people, and found 10 different diseases and 4 different symptoms. With that data, we can come up with a table like this:

joint_and_marginal_table

We have the diseases in columns 1..10, and symptoms in rows 1..4, and each cell in the matrix represents the number of people having that disease and showing those particular symptoms.

The last row and last column sum up the totals, that is, the last row shows the total number of people per disease, the last column shows the total number of people per symptom.

Below a joint plot of the data:

joint_margin_plot

As you might recall, Bayes rule is expressed by :

p(A|B) = p(B|A) * p(A) / p(B) or, in this context:

p(disease|symptom) = p(symptom | disease) * p(disease) / p(symptoms) — note the pluralis form of symtom

These terms above have particular names:

  • p(A) is called ‘prior’
  • p(A|B) is called ‘posterior’
  • p(B|A) is called ‘likelihood’
  • p(B) is called ‘marginal likelihood’

What I like about the example from the book mentioned above, is that the table clearly demonstrates how Bayes rule comes up, let’s have a look:

So, we want to find p(A|B), that is the probability for having a particular disease, given that we have some specific symptoms.  Having the table above, makes this very easy: let’s say we have observed symptom nr 3 (in row labeled 3)  and and want to know the probability, given these symptoms, that we have disease number 5.

So using Bayes rule, we can express this as:

p(disease_5 | symptom_3) = p(symptom_3 | disease_5) * p(disease_5) / p(symptoms)

Now, we can easily lookup each of the right hand terms in the table above:

  • p(symptom_3 | disease_5) : we look in row 3, col 5, and find that there are 16 people. Now, we need to convert this to a probability.  Since we know that we are dealing with disease_5, and we want to know the probability of symptom_3, we divide the 16 we found in the cell with the total number of people, over all symptoms, that have symptom_3, i.e. the number at the bottom of the column for disease 3, which says 37. So, 16 out of the 37 people  having disease_5 show symptom_3, which gives us a likelihood of 16/37, or about 0.43.
  • p(disease_5) : again, looking at the table, we can see that out of the 200 people, 37 have disease_5, which gives us the prior probability (‘prevalence’ / ‘base rate’) of 37/200, or ~0.185.
  • Finally, the marginal likelihood, that is, the denominator of Bayes rule, is given by the the total number of people having symptom_3, divided by all people, that is, 71/200, ~0.355.

So, what we get is ( 16/ 37 * 37/200 ) / 71/200

which can be written as 16/37 * 37/200 * 200/71, where we can cancel 37 and 200, and obtain 16/71, or about 0.225, which is our sought after probability for disease_5 given symptoms_3, aka the ‘posterior probability’.

What’s interesting is that we, instead of all the above calculations, can also read the posterior directly from the table: since we are looking for the probability of disease_5 given symptoms_3, we can see the ’16’ in that cell. So, the probability for disease_5 given symptoms_3 is the number of people in that cell (16), divided by the total number of people having symptom_3, which is 71, and can be read from the last column, which btw, when dealing with probability distributions, is often called ‘marginal probability’ for symtoms, while the last row in the table is called marginal probability for diseases.

The dashed lines between the circled numbers in the table illustrate the key components of Bayes rule, the prior, the posterior the likelihood and the marginal likelihood:

  • Prior is given by the ratio obtained by dividing the blue ‘circle’ by the red circle, i.e. 37/200
  • Posterior is given by the ratio obtained by dividing the green circle by the magenta circle, i.e. 16/71
  • Likelihood is given by the ratio obtained by dividing the green circle by the blue circle, i.e. 16/37
  • Marginal Likelihood is given by the ratio obtained by dividing magenta by red, 71/200.

Thus, in the table above, posterior and prior are given by the horizontal dashed lines, while likelihood and marginal likelihood are given by the vertical dashed lines.

Since I just mentioned the marginal probabilities, let’s convert the table above to a probability distribution, by dividing each cell by the total number of people:

joint_and_marginal_distribution_table

Now, we have something called ‘joint distribution’, that is, instead of a table with absolute numbers, we have a table expressing the joint probabilities for each combination of disease and symptom.  The row and column named ‘All’, are referred to as ‘marginal probabilities’, and in context of Bayes, the row named ‘All’ is called ‘Prior probability’, and the column named ‘All’ is known as the ‘marginal likelihood’.

With a bit of coding, we can apply the logic described above to arrive at the posterior distributions for all diseases, given the symptoms:

posterior_probability

And finally, we can also look at the priors, likelihoods, and posteriors in a really messy graph 🙂

posterior_and_likelihood

In order to verify your understanding of Bayes rule: look at disease 5, and look at the plots for symptoms 2 (red) and and 3 (orange): notice that for the likelihoods, symtom 3 (orange) is higher than symptom 2 (red), that is, the probability for symptom_3 is higher than the probability for symptom_2. Still, for the posteriors, symptom_2 has higher probability… What’s going on…? HINT: it’s *not* due to the prior, since the prior is the same…

If you look at the first table above, the one with the absolute numbers, and recall the ‘shortcut’ to Bayes rule, that is, dividing the number in the cell with the marginal likelihood, you should be able to figure out why a higher likelihood doesn’t necessarily result in a higher posterior probability.  Also, thinking in terms of ‘True Positives’ vs ‘False Positives’ might be of help… 🙂

About swdevperestroika

High tech industry veteran, avid hacker reluctantly transformed to mgmt consultant.
This entry was posted in Bayes, Math, Probability and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s