Simpson's Paradox: When the Trend Reverses
Simpson's Paradox
Simpson's Paradox is a statistical phenomenon where a trend appears in several different groups of data but disappears or reverses when these groups are combined.
It is counter-intuitive. It breaks our brain. But it is real.
The UC Berkeley Gender Bias Case
In 1973, UC Berkeley was sued for bias. Their data showed that 44% of men were admitted to graduate school, but only 35% of women.
It looked like clear discrimination.
But when they looked at individual departments, the trend reversed. In almost every department, women were admitted at a higher rate than men.
How?
Women applied to competitive departments with low acceptance rates (like English). Men applied to departments with high acceptance rates (like Engineering).
The weighted average made it look like bias against women. But the specific data showed bias for women.
The Batting Average Puzzle
Player A can have a higher batting average than Player B in the first half of the season. Player A can have a higher batting average than Player B in the second half of the season.
But Player B can have a higher batting average for the entire season.
It depends on the number of at-bats in each half.
The Diagnosis
Aggregated data hides the truth.
Always look at the subgroups. Averages lie. Details tell the story.