Base Rate Neglect: Why 99% Accuracy is Meaningless
Base Rate Neglect
Imagine a disease that affects 1 in 1,000 people. We have a test that is 99% accurate.
You test positive. What is the chance you have the disease?
Most people say 99%. The answer is 9%.
The Math of False Positives
Let's test 1,000 people.
- 1 person has the disease. The test catches it (True Positive).
- 999 people are healthy.
- The test has a 1% error rate. So 1% of the 999 healthy people will test positive. That is 10 people (False Positives).
So, we have 11 positive results. Only 1 is real.
1 divided by 11 is roughly 9%.
Ignoring the Baseline
This is Base Rate Neglect. We focus on the test accuracy (99%) and ignore the prevalence of the disease (0.1%).
If a disease is rare, even a great test will generate mostly false alarms.
The Terrorist Paradox
This applies to surveillance too.
If you have software that identifies terrorists with 99.9% accuracy, it is useless.
There are very few terrorists. There are millions of innocent people. The software will flag thousands of innocents for every real terrorist.
The Diagnosis
Context matters. A positive test is not a verdict. It is a probability.
Always ask: "How rare is this condition?" If it is a zebra, it is probably just a horse with a stripe painted on it.