As noted previously, e.g. in this post, care must be taken with statistics…

From Unherd.com:

Thanks to Richard McElreath’s Baysian Inference class, I happened to have the data from the example on Simpson’s Paradox in the University Admissions example given in the article.

It’s a very illustrative example on what’s called “Confounding” or “Confounds” in statistics:  basically, a confound is some unobserved variable that impacts the variables in the model, and can often wreck havoc with causal inferences, e.g. by reversing them.

In the UCB example, the hypothesis was that since only 30% of women got admitted to PhD programmes, while the acceptance rate for men was over 40%, there must be gender discrimination in play, i.e. that the admission board discriminated women.

Only by identifying the confounders – in this case, the Departments of the University – did it become clear that there was no discrimination by the admission board – the reason women had a lower overall acceptance rate was because they on average applied to departments that accept very few students, while men on average applied to other departments, with much larger acceptance.

So, by “controlling for” the confounder, that is, the department, the hypothesis that the acceptance board discriminated women could be refuted.

Below a quick Python hack to illustrate this.

[before you run this code, make sure you don’t have anything named ‘share’ in your current directory!]

```import os

os.system('rm -rf share')
os.system('git clone https://github.com/tolex3/share.git')

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

sns.set()

pivot = pd.pivot_table(df,index='dept',
columns='applicant.gender',
aggfunc=sum,margins=True)

pivot['applications_f'] = pivot[('applications','female')] / \
pivot[('applications','All')]

pivot['applications_m'] = pivot[('applications','male')] / \
pivot[('applications','All')]

pivot[('applications','female')]

pivot[('applications','male')]

pivot[('applications','All')]