Simpson’s Paradox: How to Lie with Statistics

We’ve all heard the saying from Benjamin Disreali, “Lies, damn lies, and statistics.”

While statistics has proven to be of great benefit to mankind in almost every endeavor, inexperienced, sloppy, and downright unscrupulous statisticians have made some pretty wild claims. And because these wild claims are often presented as statistical fact, people in all industries – from business, to healthcare, to education -have chased these white elephants right down the rabbit hole.

Anyone who has taken even an introductory statistics course can tell you how easily statistics can be misrepresented. One of my favorite examples involves using bar charts to confuse the audience. Look at the chart below. It represents the number of games won by two teams in a season of beer league softball.

2017-01-10_10-28-21

At first glance, you might think Team B won twice as many games as Team A, and that is indeed the intention of the person who made this chart. But when you look at the numbers to the left, you will see Team A won 15 games to Team B’s 20. While I am no mathematician, even I know 15 is not half of 20.

This deception was perpetrated by simply adjusting the starting point of the Y – Axis. When you reset it to 0, the chart tells a different story.

2017-01-10_10-31-54.jpg

Even Honest People Can Lie by Accident

In the example above, the person creating the chart was manipulating the data on purpose to achieve a desired effect. You may look at this and say I would never deceive people like that, the truth is – you just might do it by accident.

What do I mean? Let’s take an example from an industry fraught with horrible statistics – our education system.

Below you will find a chart depicting the average math scores on a standardized test since 2000 for Happy Town, USA. You will notice the test scores are significantly lower now than they were back in 2000.

2017-01-10_10-40-48

What does this mean? Are the kids getting stupider? Has teacher quality gone down? Who should be held accountable for this? Certainly those lazy tenured teachers who are only there to collect their pensions and leach off the tax payers.

I mean look at the test scores. The average score has dipped from around 90 to close to 70. Surely something in the system is failing.

Now what if I were to tell you that the chart above – while correct – does not tell the whole story. Test scores in Happy Town, USA are actually up – if you look at the data correctly.

What we are dealing with is something known in statistics as Simpson Paradox, and even some of the brightest academic minds have published research that ignored this very important concept.

What do I mean?

Let me tell you the whole story about Happy Town, USA. Happy Town was your average American middle class town. The economic make-up of this town in 2000 was 20% of the families made over $150K, 60% made between $150K and $50K, with 20% earning less than $50K a year.

In 2008, that all changed. The recession hit causing people to lose their jobs and default on their mortgages. Families moved out, housing prices fell. Due to the new lower housing prices, families from Non-So Happy Town, USA were able to afford houses in Happy Town. They moved their families there in hopes of a better education and better life for their children.

While the schools in Happy Town were better, the teachers were not miracle workers. These kids from Not So Happy Town did not have the strong educational foundation the pre-recession residents of Happy Town did. Many teachers found themselves starting almost from scratch.

No matter how hard these new kids and their teachers tried, they could never be expected to jump right in and perform as well as the pre-2008 Happy Town kids. The economic makeup of the town shifted. The under $50K’s now represent 60% of the town’s population, with 150K-50K making up only 30% and the top earners dwindling down to 10%.

So while taking an average of all the students is not a sign of someone necessarily trying to pull the wool over your eyes, it does not tell the whole story.

To see the whole story, and to unravel Simpson Paradox, you need to look at the scores across the different economic sectors of this town which has undergone drastic changes.

2017-01-10_11-26-34

Looking at from the standpoint of economic sector, you will see the scores in each sector have improved. With the under $50K improving at an impressive rate. Clearly the teachers and staff at Happy Town School are doing their job and then some.

So while the person who took the average of the whole school may not have intended to lie with their statistics, a deeper dive into the numbers showed that the truth was hidden inside the aggregate.

Keep this in mind next time someone shows you falling SAT scores, crime stats, or disease rates. All of these elements are easily affected by a shift in demographics. If you don’t see the breakdown, don’t believe the hype.

One thought on “Simpson’s Paradox: How to Lie with Statistics

Please Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s