Just finished reading Brian Clegg’s excellent ‘Dice World – science and life in a random universe‘. Highly recommended to anyone who’s interested in the inherent unpredictability of our world, and even more highly recommended to all those empty suits who often believe that we inhabit a deterministic and predictable world….

Anyways, Clegg’s presents a very informative yet easy to understand example of Bayesian reasoning:

Let’s assume you know that I own a dog. What’s the probability that my dog is a Golden Retriever ?

How would you go about finding out that probability ?

Perhaps your first step would be to look for canine statistics from e.g. the national kennel club. Although kennel clubs only keep track of registered dogs, the information found there should give you at least a rough ball park number for the sought after probability that a dog owner – any dog owner – in your country would indeed own a Golden Retriever. Turns out that for Sweden, the ratio of Golden Retrivers to the total number of registered dogs is about 8% (as calculated from 2012 new registrations).

WIthout any further information about my dog ownership, that’s it, the best guesstimate we can come up with, i.e. without any further info, your best guess to assess whether my dog is a Golden Retriever or not would be to set that probability to 8%.

But what if you’d be able to find additional pieces of data, that might provide further information about my dog preferences ? Let’s say that I have a coffee mug on my desk with an image of a Golden Retriever. Equipped with that additional piece of information, is there a reason to revise your initial estimate of 8% ? And if so, in which direction…?

It’s here where Bayesian reasoning comes to play: Bayes theorem allows us to revise (update) our initial estimate (the ‘priori’) of 8%, if we subsequently get hold of additional data that we believe will impact the likelihood.

So, let’s say that the new fact found, that I do have a mug with a Golden Retriever on my desk indeed is significant. The question now becomes to determine how significant that new fact is.

Here, unless we are able to find real stats on the relationship between possession of mugs with Golden Retrievers and actual Golden Retriever ownership, we will have to use our best judgement, that is, our best guess on how likely it is that someone with a Golden mug also owns a Golden Retriever.

How we estimate that number can differ, but one way that often provides vastly better results than a simple wild guess is using ‘Wisdom of Crowds’, i.e. letting a (large) number of people make that guess, and taking the average of those guesses.

But here, let’s just use my own best guess: I believe the likelihood of owning a Golden mug if you own a Golden Retriever is 50% Similarily, my best guess for you having a Golden mug without owning a Golden Retriever is just 1%. Where did these numbers come from…? No idea, these are just my best guesses given the lack of better data; lacking any other, better info on these two numbers, I settled on 50% and 1%.

What values would you come up with for these two parameters ?

So, with these data, the prior belief (based on real stats over the entire dog owning population) that the probability for my dog being a Golden Retriever is 8%, and the new information that I own a mug with an image of a Golden Retriever: what will our revised (posterior) Bayesian probability become ?

If we look at the Bayes formula in the image above, it says: “the probability of event A given event B is equal to the probability of B given A times probability of A, divided by probability of B”.

So, if we let A represent ‘owns a Golden Retriver’, and B ‘owns a Mug with a Golden’, we can transcribe the formula to:

P(G|M) = (P(M|G) x P(G)) / P(M)

We know that P(M|G) == 0.5 (after all, we defined it to be 50% in an earlier step! :-), and we know from our Kennel’s dog database that P(G) == 0.08. But what about P(M), i.e. the probability of ownership of a mug with a Golden ? Well, we have already defined that too, albeit indirectly: The probability for such a mug is given by 0.5 x 0.08 + 0.01 x 0.92, that is the sum of the probabilities ‘mugs at Golden Retriever owners plus mugs at non-Golden Retriever owners.

So, now we have all parameters, and plugging in the numbers, we get a revised (‘posterior’) probability for my dog being a Golden, given that I own a mug with an image of a Golden, to 81%!

That’s a far bit larger probability than our ‘generic’ initial guess of 8%!

Obviously, this revised probability is fully dependent on the accuracy of our estimates for the new evidence, that is, the importance, relevance and values we assign as percentages to the fact that I own the mug.

Another, perhaps more intuitive way to understand Bayesian resoning is as follows:

Assume there are 10000 dog owners. From the canine club yearly stats, we find out that 8% of the dog population consists of Golden Retrievers. That gives us 800 Golden Retrievers. We have defined, by guessing, that given ownership of a Golden Retriever, the probability owning a golden mug is 50%. That gives us 0.5 x 800 = 400 golden mugs, owned by Golden Retriever owners. We have also defined, also by guessing, that given non-Golden ownership, the probability of owning a golden mug is 1%. That gives us 0.01 x (10000 – 800) = 92 additional golden mugs, these 92 owned by non-Golden Retriever-owners.

In total, we have 492 golden mugs, of which 400 are owned by Golden Retriever owners. Thus, the revised probability for my dog being a Golden Retriever, is 400/492, or 81 %.