## Probability of a random man/woman pair where the woman is taller than the man ?

Experience might perhaps tell us that in most couples, the man typically is taller than the woman.  But what’s the probability for having things the other way around, i.e. the lass being taller than the lad?

Given the below statistics (from Karl L. Wuench) for mean hights and standard deviations, how would you go about figuring out the desired probability:

Men – mean 177 cm, standard deviation 7.1 cm

Women – mean 163 cm, standard deviation 6.6 cm

One way would be to run a simulation, so let’s do that:

# simulation
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as sps

maleMean = 177
maleStd = 7.1
femaleMean = 163
femaleStd = 6.6

males = np.random.normal(maleMean,maleStd,10000)
females = np.random.normal(femaleMean,femaleStd,10000)

iterations = 100000
female_taller = 0
maleList, femaleList = [],[]

for i in range(iterations):
male = np.random.choice(males)
female = np.random.choice(females)

if female > male:
maleList.append(male)
femaleList.append(female)
female_taller +=1

print ('iterations:',iterations)
print ('females taller:',female_taller)
print ('ratio:',female_taller/iterations)


Running this simple simulation, we get that 7.7% of couples paired in random will have the woman taller than the man.

Is there an analytical metod for checking the result…? Yes, there is: in short (for full details, check the complete code below), we want to find the probability where the hight difference between men and women is greater than 0 from a normal distribution.

The below graph illustrates this:

The top subplot shows the two distributions, women and men. The subplot below standardized the data to N(0,1).  The dashed vertical bars illustrate the standardized means for women vs men. To find a couple where the woman is taller than a man, follow the cumulative density line for man (blue dashed curve) to the point where it crosses the female mean. The the cdf value at that point is 0.08 (rounded to two decimals).  Which concurs pretty well with the value obtained from our simulation above.

(The math required for the calculation is embedded in the code below).

It might be a bit tricky, from the graph,  to see why this works, so here’s the intuition:

Let’s start by picking a random man from the male population. Our best estimate of his length, the expected value, is the mean.  The question then becomes “given our mean random man, what’s the probability to randomly select a women taller than the man?”

From the graph we find that the cdf for women, the red dashed curve, has a value of 0.92 at the intersection with the male mean value, meaning that the probability for a women being taller than male mean is 1 – 0.92, that is, about 8%.

Below a plot of the pairs meeting the criteria (woman > man):

And the next graph shows the length distributions man/woman for the pairs that meet the criteria:

As can be seen from the above graph, the distribution of man/women-pairs meeting the criteria clearly shows that the men chosen tend to come from well below the mean male hight, while the chosen women tend to come from well above the mean female hight – nothing surprising there, a posteriori, at least 🙂

'''
what's the p picking a woman under a mean man ? 92%, given by
woman's cdf.
what's the p picking a man under mean woman ? 8%, given by man's cdf
'''

import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as sps

maleMean = 177.038
maleStd = 7.112
maleVar = maleStd * maleStd
femaleMean = 163.322
femaleStd = 6.604
femaleVar = femaleStd * femaleStd

# jointStd for common language effect size must be calculated from variance!
jointStd = np.sqrt(maleVar + femaleVar)

def standardize(values,mu,sigma):
return (values - mu) / sigma

x_vals = np.linspace(140,210,100)
x_vals_std = standardize(x_vals,maleMean,jointStd)

mean_delta = (femaleMean - maleMean) / jointStd
print (mean_delta)
# point where difference is 0
zero_diff = sps.norm.cdf(mean_delta,0,1)
print ('p(zero_diff):', 1. - zero_diff)
p_female_over_male_mean = sps.norm.cdf(0,mean_delta,1)
print ('p(female over male mean:',p_female_over_male_mean)
p_man_over_female_mean = sps.norm.cdf(mean_delta,0,1)
print ('p(male over female mean:',p_man_over_female_mean)
print ('Effect size:',(maleMean - femaleMean) / jointStd)

males = np.random.normal(maleMean,maleStd,10000)
females = np.random.normal(femaleMean,femaleStd,10000)

plt.subplot(2,1,1)
title =str(r'males: $\mu:{},\sigma:{}, females:\mu:{},\sigma:{}$').format(
np.round(maleMean,1),np.round(maleStd,1),np.round(femaleMean,1),np.round(femaleStd,1))

plt.title(title)
plt.plot(x_vals,sps.norm.pdf(x_vals,maleMean,maleStd),color='b')
plt.plot(x_vals,sps.norm.pdf(x_vals,femaleMean,femaleStd),color='r')
plt.axvline(maleMean,c='b',ls='dashed')
plt.axvline(femaleMean,c='r',ls = 'dashed')
plt.hist(males,normed=True,color='b',alpha=0.7)
plt.hist(females,normed=True,color='r',alpha=0.7)

plt.subplot(2,1,2)
title = str('Effect Size (CL):{}').format(np.round((maleMean-femaleMean)/jointStd,2))
plt.title (title)
plt.plot(x_vals_std,sps.norm.pdf(x_vals_std,0,1),color='b')
plt.plot(x_vals_std,sps.norm.pdf(x_vals_std,mean_delta,1),color='r')
plt.plot(x_vals_std,sps.norm.cdf(x_vals_std,0,1),color='b',ls='dashed')
plt.plot(x_vals_std,sps.norm.cdf(x_vals_std,mean_delta,1),color='r',ls='dashed')
plt.axvline(0,color='b',ls='dashed')
plt.axvline(mean_delta,color='r',ls='dashed')

annot1 = str('p(man  man')
plt.hist(males,normed=True,histtype='step',color='b',alpha=1,lw=3,
label='Male population',ls='dashed')
plt.hist(maleList,normed=True,color='c',alpha=0.7,
label='short males in pairs')
plt.hist(females,normed=True,histtype='step',color='r',lw=3,ls='dashed',
label='Female population')
plt.hist(femaleList,normed=True,color='red',alpha=0.7,
label='tall women in pairs')
plt.axvline(maleMean,color='b',lw=3,ls='dashed')
plt.axvline(femaleMean,color='r',lw=3,ls='dashed')
plt.legend(loc='upper right')
plt.show()