Experience might perhaps tell us that in most couples, the man typically is taller than the woman. But what’s the probability for having things the other way around, i.e. the lass being taller than the lad?

Given the below statistics (from Karl L. Wuench) for mean hights and standard deviations, how would you go about figuring out the desired probability:

Men – mean 177 cm, standard deviation 7.1 cm

Women – mean 163 cm, standard deviation 6.6 cm

One way would be to run a simulation, so let’s do that:

# simulation import numpy as np import matplotlib.pyplot as plt import scipy.stats as sps maleMean = 177 maleStd = 7.1 femaleMean = 163 femaleStd = 6.6 males = np.random.normal(maleMean,maleStd,10000) females = np.random.normal(femaleMean,femaleStd,10000) iterations = 100000 female_taller = 0 maleList, femaleList = [],[] for i in range(iterations): male = np.random.choice(males) female = np.random.choice(females) if female > male: maleList.append(male) femaleList.append(female) female_taller +=1 print ('iterations:',iterations) print ('females taller:',female_taller) print ('ratio:',female_taller/iterations)

Running this simple simulation, we get that 7.7% of couples paired in random will have the woman taller than the man.

Is there an analytical metod for checking the result…? Yes, there is: in short (for full details, check the complete code below), we want to find the probability where the hight difference between men and women is greater than 0 from a normal distribution.

The below graph illustrates this:

The top subplot shows the two distributions, women and men. The subplot below standardized the data to N(0,1). The dashed vertical bars illustrate the standardized means for women vs men. To find a couple where the woman is taller than a man, follow the cumulative density line for man (blue dashed curve) to the point where it crosses the female mean. The the cdf value at that point is 0.08 (rounded to two decimals). Which concurs pretty well with the value obtained from our simulation above.

(The math required for the calculation is embedded in the code below).

It might be a bit tricky, from the graph, to see why this works, so here’s the intuition:

Let’s start by picking a random man from the male population. Our best estimate of his length, the expected value, is the mean. The question then becomes “given our mean random man, what’s the probability to randomly select a women taller than the man?”

From the graph we find that the cdf for women, the red dashed curve, has a value of 0.92 at the intersection with the male mean value, meaning that the probability for a women being taller than male mean is 1 – 0.92, that is, about 8%.

Below a plot of the pairs meeting the criteria (woman > man):

And the next graph shows the length distributions man/woman for the pairs that meet the criteria:

As can be seen from the above graph, the distribution of man/women-pairs meeting the criteria clearly shows that the men chosen tend to come from well below the mean male hight, while the chosen women tend to come from well above the mean female hight – nothing surprising there, a posteriori, at least 🙂

''' what's the p picking a woman under a mean man ? 92%, given by woman's cdf. what's the p picking a man under mean woman ? 8%, given by man's cdf ''' import numpy as np import matplotlib.pyplot as plt import scipy.stats as sps maleMean = 177.038 maleStd = 7.112 maleVar = maleStd * maleStd femaleMean = 163.322 femaleStd = 6.604 femaleVar = femaleStd * femaleStd # jointStd for common language effect size must be calculated from variance! jointStd = np.sqrt(maleVar + femaleVar) def standardize(values,mu,sigma): return (values - mu) / sigma x_vals = np.linspace(140,210,100) x_vals_std = standardize(x_vals,maleMean,jointStd) mean_delta = (femaleMean - maleMean) / jointStd print (mean_delta) # point where difference is 0 zero_diff = sps.norm.cdf(mean_delta,0,1) print ('p(zero_diff):', 1. - zero_diff) p_female_over_male_mean = sps.norm.cdf(0,mean_delta,1) print ('p(female over male mean:',p_female_over_male_mean) p_man_over_female_mean = sps.norm.cdf(mean_delta,0,1) print ('p(male over female mean:',p_man_over_female_mean) print ('Effect size:',(maleMean - femaleMean) / jointStd) males = np.random.normal(maleMean,maleStd,10000) females = np.random.normal(femaleMean,femaleStd,10000) plt.subplot(2,1,1) title =str(r'males: $\mu:{},\sigma:{}, females:\mu:{},\sigma:{}$').format( np.round(maleMean,1),np.round(maleStd,1),np.round(femaleMean,1),np.round(femaleStd,1)) plt.title(title) plt.plot(x_vals,sps.norm.pdf(x_vals,maleMean,maleStd),color='b') plt.plot(x_vals,sps.norm.pdf(x_vals,femaleMean,femaleStd),color='r') plt.axvline(maleMean,c='b',ls='dashed') plt.axvline(femaleMean,c='r',ls = 'dashed') plt.hist(males,normed=True,color='b',alpha=0.7) plt.hist(females,normed=True,color='r',alpha=0.7) plt.subplot(2,1,2) title = str('Effect Size (CL):{}').format(np.round((maleMean-femaleMean)/jointStd,2)) plt.title (title) plt.plot(x_vals_std,sps.norm.pdf(x_vals_std,0,1),color='b') plt.plot(x_vals_std,sps.norm.pdf(x_vals_std,mean_delta,1),color='r') plt.plot(x_vals_std,sps.norm.cdf(x_vals_std,0,1),color='b',ls='dashed') plt.plot(x_vals_std,sps.norm.cdf(x_vals_std,mean_delta,1),color='r',ls='dashed') plt.axvline(0,color='b',ls='dashed') plt.axvline(mean_delta,color='r',ls='dashed') annot1 = str('p(man man') plt.hist(males,normed=True,histtype='step',color='b',alpha=1,lw=3, label='Male population',ls='dashed') plt.hist(maleList,normed=True,color='c',alpha=0.7, label='short males in pairs') plt.hist(females,normed=True,histtype='step',color='r',lw=3,ls='dashed', label='Female population') plt.hist(femaleList,normed=True,color='red',alpha=0.7, label='tall women in pairs') plt.axvline(maleMean,color='b',lw=3,ls='dashed') plt.axvline(femaleMean,color='r',lw=3,ls='dashed') plt.legend(loc='upper right') plt.show()