Gender Bias – but not the way you’d think : Simpson’s Paradox strikes again!

I’ve previously touched upon Simpson’s Paradox and the (for statisticians!) famous example fromUniversity of California Berkeley, where it indeed looked like that the admission board favored men over women: while 44% of male applicants got admitted, only 35% of the female applicants were admitted.  So, fearful of a potential lawsuit, the Dean asked some statisticians to help out clarifying the issue.

In this post we’ll look at whether there indeed was any gender bias, and if so, whether we can quantify it.

But before that, we need to be clear about what actually constitutes gender or any other type of bias, in the first place:

Let’s say that of the students admitted to a specific program, 100% were male. Does that imply that the admission board discriminated against women…?  Not necessarily.  There are a couple of ways in which admission could be that skewed, without any admission bias being in play:

  •   Perhaps the male students had much better grades than all of the women.  That’s not very likely, though.
  • Or perhaps all the applicants to the program were men.

To be able to analyze whether there indeed is any bias, we must assume that all applicants, regardless of gender or any other classifier, are exactly equally qualified for the program. That is, all of the applicants have identical qualifications, no one is better than anybody else.  If that’s not the case, then it’s not very meaningful trying to analyze potential bias from admittance numbers.

Before looking at the UCB-case, it might be helpful to start with a constructed, simpler example, with only two departments, and 200 students, 100 male, 100 female. Furthermore, let’s start with an example where no bias exists:


In this example, 42% of the men were accepted to the University, while only 18% of the women were accepted. At the university level, the acceptance difference is 24 pct points in favor for men. Clearly there must be gender bias in play…?


If you look closer at the Department numbers, in the columns F_admit_pct and M_admit_pct, you’ll notice that in both departments, exactly the same percentage of men and women were admitted, 50%  of  each gender in department A, and 10% of each gender in department B.  Still, the overall, University level admission has totally skewed proportions…

This is the Simpson’s Paradox. Basically, the departments act as confounders, and when we “control” for department, the outcome variable we are investigating, percent admitted per gender, suddenly changes direction.  No bias, no discrimination, just a paradox, stemming from a different view on the data.

Below graph illustrates this:


What’s going on behind the paradox is that women and men have different preferences: most of the men, 80%, apply to department A, a department that accepts 50 students, while most of the women, 80%, apply to department B, that only accepts 10 students.  Thus, the reason the overall university admittance appears so biased against women is that most women apply to a program with very few seats, while men apply to a program with lots of seats.

Simpson’s paradox occurs in many settings where data is categorized, e.g. by gender, ethnicity or whatever grouping classifier, and is often – but not always – the underlying reason for apparent bias in many contexts, such as admissions to schools, jobs etc.

So how could we quantify any bias, if one had existed, which in this case there does not ? Well, a traditional statistician would probably use Hypothesis Testing, p-values and confidence intervals to assert statistical significance. I’m not a statistician, even though I’ve done formal statistics at university, and frankly, I’ve never fully understood the logic of  p-values, hypothesis testing etc, so I won’t even try going that path.  However, Bayesian Inference is, at least to me, a much clearer, logical way to understand whether there exists a difference between variables or parameters, so let’s go the Bayesian way.

And as always with Bayes, we are going to use the distributions. For this example, I’m not going to publish any code, because frankly, the details of the Bayesian Model to analyze this example are a bit tricky to understand  – I’m using something called a “Generalized Linear Model”, which is capable of dealing with categorical variables such as departments, and furthermore I’m using a logit link function to map the linear regression model to the probability space. To explain these concepts so that the reader could follow the code, would take a handful of pages, without adding much value to the topic of this post, that is, to understand Simpson’s Paradox and how to quantify any potential bias. However, those of you who’d like to understand the details, not only of this example, but of Computational Bayesian Inference in general, I strongly recommend Richard McElreath’s book, “Statistical Rethinking“.

So, here’s the probability distributions for admittance to department A:


We can clearly see that for women as well as for men, the distributions are centered at 0.5, that is, men as well as women have a 50% chance of being admitted to department A.

Department B:


Again, the distributions for men and women are fully overlapping, but now, since department B accepts very few students, the probability for both men and women to be admitted is centered on 10%.

The key to determine whether there’s been any bias is to look at whether the distributions differ, and in this case, from occular inspection alone, we can see that they do not.

But we can be more stringent than that when we are Bayesian, we can also look at the distribution of the difference of the distributions… yes, that right: the distribution of the difference! Within some domains – e.g. Psychology, I believe – these are called “contrasts”.

Let’s look at department A:


The above graph shows the diff in the distributions for probability of admittance for men and women in department A. And we can see here that the difference is clearly centered at 0, meaning that there is no difference in probability of being admitted. The probability mass of the difference is fully balanced around zero, i.e. zero difference in probability.

For department B, the difference graph looks the same.

Next, let’s do some real bias:ing, by still admitting proportionally more men than women to the University, that is, “discriminating” against women at the University level, but, paradoxically, admitting more women than men into both departments:


Now, looking at the rightmost column, we can see that overall, at university level, there’s still 11 pct points advantage for men, but at the individual department levels, there’s an advantage for women, 6 pct points in department A and whopping 30 pct points in department B…! Simpson’s Paradox, again!

Let’s look at the graph:


Next, let’s see if we can quantify the bias in favor for women at department level, that is, let’s see if we can determine the difference in probability or chance of being admitted, given that you are male or female. Again, we will use Baesian Inference, and have a look at the distributions for the probability of being admitted, for men vs women:

Department A:


Now, already by occular inspection, it’s easy to see that there is a clear difference between genders in probability of being admitted: the are, unlike in the previous example, not fully overlapping, but the red female histogram is significantly higher up to the right, meaning that women have a greater chance of being admitted.

How much higher probability…? Let’s look at the distribution of the difference:


So, the difference in probability is centered at 10 pct points in favor for the women, and the probability for men having higher probability than women is only 2%, which is, as far as I can figure out of ‘traditional’ statistical terminology, more or less the same as saying that the result obtained is “statistically significant” with a p-value of 2%. However, as I said above, I have really never understood the concepts of p-values and statistical significance, so don’t take my word about them as gospel.

For department B, the corresponding graphs are:



For department B, the overlap of the distributions is even smaller, and now the difference in probability of admittance is whopping 18 pct points.

So now, with this construed example done, we are ready to figure out whether the Dean of UCB 1973 had reason for concern regarding potential discrimination or not.

But that will have to wait until next time.









About swdevperestroika

High tech industry veteran, avid hacker reluctantly transformed to mgmt consultant.
This entry was posted in A/B Testing, Bayes, Data Analytics, Probability, PYMC, Statistics and tagged , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s