In a previous post, I discussed the seemingly unintuitive logic of the famous Monty Hall Problem. However, with some careful thinking, even without resorting to Monte Carlo Simulation, I’m able to make sense of that apparent paradox.
However, the paradox presented here, is *really* mindbending – personally, I’m unable to wrap my head around the result, even after having verified it by following the rules of probability theory, and by running a Monte Carlo Simulation – the result is still hard to accept! Even the stat’s professionals have problems with this one – the interested reader can find even more confusing info regarding this problem here.
If you happen to have a great intuition for how to understand this paradox, please share it in the comment section below!
The problem is taken from Brian Clegg’s brilliant book “Dice World” (much recommended!) and goes by “Born on a Tuesday”, and despite the arguments in the wiki-link above, I’m going to stick to the interpretation of “at least one boy” given in Clegg’s book:
Assume someone tells you : “I have two children. One is a boy born on a Tuesday. What’s the probability that I have two boys ?”
Your immediate intuition might tell you that the Tuesday part of the problem is just a red herring, and that the sought after probability is 1/2, since it seems “obvious” that knowing that one child is a boy, the only uncertainty is the gender of the second child, so surely that must be 1/2…?
Nope. It might be of some value to look at the outcome space for the problem: with 2 children you could have:
that is, the outcome space consists of 4 separate outcomes, of which 3 are consistent with the fact that the parent has at least one boy. And of those 3 consistent outcomes, there’s only one outcome that matches the problem statement, i.e. the probability of the parent having two boys. So, the probability asked for is 1/3.
Let’s verify this by a Monte Carlo Simulation (imports omitted for brevity):
# generate pairs of children, random gender, random day of week # def gen_children(): gender = np.random.randint(0,2,2) day = np.random.randint(1,8,2) return gender,day siblings =  iterations = 1000000 for i in range(iterations): siblings.append(gen_children()) df = pd.DataFrame(siblings) df['gender_1'] = df.apply(lambda x : x) df['day_1'] = df.apply(lambda x : x) df['gender_2'] = df.apply(lambda x : x) df['day_2'] = df.apply(lambda x : x) df.drop([0,1],axis=1,inplace=True)
The above code gives us a dataframe of 1M pairs of kids, with random gender and weekday of birth as follows:
next, let’s find the pairs with at least one boy, and pairs where both are boys, and from them compute probability for two boys given we know that one of the children is a boy:
at_least_one_boy = df.loc[ ( df['gender_1'] == 1 ) | ( df['gender_2'] == 1 ) ] both_boys = at_least_one_boy.loc[ ( at_least_one_boy['gender_1'] == 1 ) &\ ( at_least_one_boy['gender_2'] == 1 ) ] print ('P(both boys | one boy) : ',len (both_boys) / len(at_least_one_boy))
P(both boys | one boy) : 0.3330825031253711
Indeed, the probability (when the problem statement is interpreted the way Clegg uses it) is 1/3.
However, this answer does not cater for the Tuesday part of the problem, the part that I discarded above as a Red Herring… And this is where the problem becomes – at least for me – extremely unintuitive … because it turns out that including the Tuesday part into the problem,believe it or not (and I still have a hard time believing it!) actually changes the probability….!
So let’s compute that probability using Monte Carlo Simulation as well:
at_least_one_boy_born_tuesday = df.loc[ ( ( df['gender_1'] == 1 ) & ( df['day_1'] == target_day ) ) | \ ( (df['gender_2'] == 1 ) & ( df['day_2'] == target_day ) ) ] two_boys_given_at_least_one_born_tue = \ at_least_one_boy_born_tuesday.loc[ ( at_least_one_boy_born_tuesday['gender_1'] == 1) &\ (at_least_one_boy_born_tuesday['gender_2'] == 1 )] print ('P(two boys given at least one born Tuesday) : ', len (two_boys_given_at_least_one_born_tue) / len(at_least_one_boy_born_tuesday))
P(two boys given at least one born Tuesday) : 0.482585086152667
Wow….! Exactly as Clegg shows in his book, the probability now changes to slightly below 0.5….!
To figure out what happens analytically, let’s enumerate the first few outcomes of the outcome space
Boy (Mon) Girl (Mon) (1)
Boy(Tue) Girl (Mon) (2)
Boy(Sun) Girl (Mon) (7)
Girl(Mon) Girl(Mon) (8)
Girl(Tue) Girl(Mon) (9)
Girl(Sun) Girl(Mon) (14)
So there’s 14 (2 x 7) combinations of boy and girl born on each day of the week, matching a girl born on Monday.
In total there are 2 * 7 * 2 * 7 (196) possible combinations in the outcome space: gender[child 1] * day[child 1] * gender[child 2] * day[child 2].
Out of these 196 possible outcomes, we now need to figure out how many of them feature a boy born on a Tuesday.
I’m too lazy to draw the entire matrix on paper, instead, let’s use Python to compute all the 196 possibilities:
#### analytic calculation #### import itertools as it # gender A, day A, gender B, day B # l = [[0,1],[1,2,3,4,5,6,7],[0,1],[1,2,3,4,5,6,7]] # cartesian product # outcome_space = list(it.product(*l)) outcome_space = pd.DataFrame(outcome_space,columns=['gender_A','day_A','gender_B','day_B']) outcome_space
# at least one child is boy born Tuesday # boy_born_tue = outcome_space.loc[ ( ( outcome_space['gender_A'] == 1 ) & \ ( outcome_space['day_A'] == target_day ) ) | \ ( ( outcome_space['gender_B'] == 1 ) & \ ( outcome_space['day_B'] == target_day ) ) ] two_boys_at_least_one_born_tue = boy_born_tue.loc[ ( boy_born_tue['gender_A'] == 1 ) &\ ( boy_born_tue['gender_B'] == 1 ) ] print ('P(two boys at least one born Tue) : ',len (two_boys_at_least_one_born_tue) / len(boy_born_tue))
P(two boys at least one born Tue) : 0.48148148148148145
Wow…! Unfortunately, I have to say that I really do not understand (analytically) how come the day of week changes things, but it sure does. Anyways, I’m in good company, since there seems to have been lot’s of heated debate among folks who are experts on math’s and probability, and as far as I can tell, the jury is still out….
The moral of the story is (again!) that “common sense” and “intuition” aren’t of much help when dealing with probability, it’s extremely easy to shoot your foot, even for PhD’s apparently.