Probability of a random man/woman pair where the woman is taller than the man ?

Experience might perhaps tell us that in most couples, the man typically is taller than the woman.  But what’s the probability for having things the other way around, i.e. the lass being taller than the lad?

Given the below statistics (from Karl L. Wuench) for mean hights and standard deviations, how would you go about figuring out the desired probability:

Men – mean 177 cm, standard deviation 7.1 cm

Women – mean 163 cm, standard deviation 6.6 cm

One way would be to run a simulation, so let’s do that:

# simulation
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as sps

maleMean = 177
maleStd = 7.1
femaleMean = 163
femaleStd = 6.6

males = np.random.normal(maleMean,maleStd,10000)
females = np.random.normal(femaleMean,femaleStd,10000)

iterations = 100000
female_taller = 0
maleList, femaleList = [],[]

for i in range(iterations):
    male = np.random.choice(males)
    female = np.random.choice(females)

    if female > male:
        maleList.append(male)
        femaleList.append(female)
        female_taller +=1

print ('iterations:',iterations)
print ('females taller:',female_taller)
print ('ratio:',female_taller/iterations)

Running this simple simulation, we get that 7.7% of couples paired in random will have the woman taller than the man.

Is there an analytical metod for checking the result…? Yes, there is: in short (for full details, check the complete code below), we want to find the probability where the hight difference between men and women is greater than 0 from a normal distribution.

The below graph illustrates this:

malefemale1

The top subplot shows the two distributions, women and men. The subplot below standardized the data to N(0,1).  The dashed vertical bars illustrate the standardized means for women vs men. To find a couple where the woman is taller than a man, follow the cumulative density line for man (blue dashed curve) to the point where it crosses the female mean. The the cdf value at that point is 0.08 (rounded to two decimals).  Which concurs pretty well with the value obtained from our simulation above.

(The math required for the calculation is embedded in the code below).

It might be a bit tricky, from the graph,  to see why this works, so here’s the intuition:

Let’s start by picking a random man from the male population. Our best estimate of his length, the expected value, is the mean.  The question then becomes “given our mean random man, what’s the probability to randomly select a women taller than the man?”

From the graph we find that the cdf for women, the red dashed curve, has a value of 0.92 at the intersection with the male mean value, meaning that the probability for a women being taller than male mean is 1 – 0.92, that is, about 8%.

Below a plot of the pairs meeting the criteria (woman > man):

malefemale2

And the next graph shows the length distributions man/woman for the pairs that meet the criteria:

malefemale4

As can be seen from the above graph, the distribution of man/women-pairs meeting the criteria clearly shows that the men chosen tend to come from well below the mean male hight, while the chosen women tend to come from well above the mean female hight – nothing surprising there, a posteriori, at least 🙂

'''
what's the p picking a woman under a mean man ? 92%, given by
woman's cdf.
what's the p picking a man under mean woman ? 8%, given by man's cdf
'''

import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as sps

maleMean = 177.038 
maleStd = 7.112 
maleVar = maleStd * maleStd
femaleMean = 163.322
femaleStd = 6.604
femaleVar = femaleStd * femaleStd

# jointStd for common language effect size must be calculated from variance!
jointStd = np.sqrt(maleVar + femaleVar)

def standardize(values,mu,sigma):
    return (values - mu) / sigma

x_vals = np.linspace(140,210,100) 
x_vals_std = standardize(x_vals,maleMean,jointStd)

mean_delta = (femaleMean - maleMean) / jointStd
print (mean_delta)
# point where difference is 0
zero_diff = sps.norm.cdf(mean_delta,0,1)
print ('p(zero_diff):', 1. - zero_diff)
p_female_over_male_mean = sps.norm.cdf(0,mean_delta,1)
print ('p(female over male mean:',p_female_over_male_mean)
p_man_over_female_mean = sps.norm.cdf(mean_delta,0,1)
print ('p(male over female mean:',p_man_over_female_mean)
print ('Effect size:',(maleMean - femaleMean) / jointStd)

males = np.random.normal(maleMean,maleStd,10000)
females = np.random.normal(femaleMean,femaleStd,10000)

plt.subplot(2,1,1)
title =str(r'males: $\mu:{},\sigma:{}, females:\mu:{},\sigma:{}$').format(
    np.round(maleMean,1),np.round(maleStd,1),np.round(femaleMean,1),np.round(femaleStd,1))

plt.title(title)
plt.plot(x_vals,sps.norm.pdf(x_vals,maleMean,maleStd),color='b')
plt.plot(x_vals,sps.norm.pdf(x_vals,femaleMean,femaleStd),color='r')
plt.axvline(maleMean,c='b',ls='dashed')
plt.axvline(femaleMean,c='r',ls = 'dashed')
plt.hist(males,normed=True,color='b',alpha=0.7)
plt.hist(females,normed=True,color='r',alpha=0.7)

plt.subplot(2,1,2)
title = str('Effect Size (CL):{}').format(np.round((maleMean-femaleMean)/jointStd,2))
plt.title (title)
plt.plot(x_vals_std,sps.norm.pdf(x_vals_std,0,1),color='b')
plt.plot(x_vals_std,sps.norm.pdf(x_vals_std,mean_delta,1),color='r')
plt.plot(x_vals_std,sps.norm.cdf(x_vals_std,0,1),color='b',ls='dashed')
plt.plot(x_vals_std,sps.norm.cdf(x_vals_std,mean_delta,1),color='r',ls='dashed')
plt.axvline(0,color='b',ls='dashed')
plt.axvline(mean_delta,color='r',ls='dashed')

annot1 = str('p(man  man')
plt.hist(males,normed=True,histtype='step',color='b',alpha=1,lw=3,
         label='Male population',ls='dashed')
plt.hist(maleList,normed=True,color='c',alpha=0.7,
         label='short males in pairs')
plt.hist(females,normed=True,histtype='step',color='r',lw=3,ls='dashed',
         label='Female population')
plt.hist(femaleList,normed=True,color='red',alpha=0.7,
         label='tall women in pairs')
plt.axvline(maleMean,color='b',lw=3,ls='dashed')
plt.axvline(femaleMean,color='r',lw=3,ls='dashed')
plt.legend(loc='upper right')
plt.show()

 

About swdevperestroika

High tech industry veteran, avid hacker reluctantly transformed to mgmt consultant.
This entry was posted in Data Analytics, Math, Numpy, Probability, Python, Simulation, Statistics and tagged , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s