Correlation Is Not Causation

Summary: Correlation shows a relationship between variables (where when one changes, the other is likely to also change), while causation proves one causes the other. Understanding this difference, especially through experimentation, helps avoid bias and supports better business and scientific outcomes.

In correlated data, a pair of variables are related in that one variable is likely to change when the other does. This relationship might lead us to assume that a change to one variable causes the change in the other, but it doesn’t. I’ll clarify that kind of faulty thinking by explaining correlation, causation and the bias that often lumps the two variables together.

Correlation vs. Causation

Correlation: A correlation is a relationship or connection between two variables in which whenever one changes, the other is likely to also change.
Causation: A causation is a relationship in which the change in one variable causes the other variable to change. A causal relationship requires valid experimentation and analytics to verify.

The brain simplifies incoming information so we can make sense of it. Our brains often do that by making assumptions about things based on perceived relationships, or bias. But that thinking isn’t foolproof. Take for example when we mistake correlation for causation. Bias may lead us to conclude that one event must cause another if both events changed in the same way at the same time. I’ll clear up the misconception that correlation equals causation by exploring both of those subjects and the human brain’s tendency toward bias.

Video explaining why correlation is not the same as causation. | Video: One Minute Economics

Why Correlation Is Not Causation

A correlation is a relationship between two variables where, when one changes, the other is likely to change as well — but that doesn’t mean one causes the other. That would be causation.

For example, as children grow into adults, both height and mass typically increase. This doesn’t mean that increasing height causes increased mass or vice versa. Instead, a third factor — biological development — drives both. This makes height and mass correlated, but the causal factor is biological development.

Causation in Business

Let’s say that we want to offer a promotion or discount to some of our customers. Our marketing department wants to maximize the delta, in other words, it wants to increase sales as a result of the promotion. So we need to decide which customers will give us the best return on our investment for the promotion or discount. Should we offer it only to our top 10 percent of clients? Or should we target the bottom 10 percent?

You might assume that the users who drive the most sales are also the ones more responsible for your business success. However, this assumption could be wrong. The best customers to offer the promotion to might be totally different. Without valid experimentation or analytics, you don’t have accurate answers to those questions.

More on Data Science: Basic Probability Theory and Statistics to Know

Correlation Is Not Causation and Cognitive Bias

There are many forms of cognitive bias or irrational thinking patterns that often lead to faulty conclusions and economic decisions. These types of cognitive bias are some reasons why people assume false causations in business and marketing:

Confirmation bias: People want to be right. We often can’t admit or accept that we’re wrong about something, even if that attitude causes eventual harm and loss.
Illusion of causality: Putting too much weight on your own personal beliefs, having overconfidence and relying on other unproven sources of information often produce an illusion of causality. An economic example is the recent U.S. housing bubble. Millions of people believed that buying a home for much more than its actual value would continue to result in a return on the investment just because that happened in the past.
Money: You want to sell your product. That desire to make money can often cloud your logic. As a result, you might end up spending more than your return on investment (ROI) on marketing and other business expenses.
Major marketing implications: Marketing statistics and data are often complicated and confusing. It can be easy to see relationships between changing sales numbers and the many other variables in your business when no causation exists.

Identify Correlation and Causation Through Experimentation

To know that something is valuable requires experimentation. Experimentation helps you understand if you’re making the right choices. But it has a cost. If you hold a group back by not giving them a feature that brings in value, you’ll lose money, but you’ll also learn the importance of that feature.

The value of an experiment lies then in accomplishing these two things:

Deciding between different choices.
Quantifying the value of the best choice.

3 Types of Experimental Variables

A scientifically-valid experiment needs to have three types of variables: controlled, independent and dependent.

A controlled variable is kept constant, so other variables that change in relation to each other can be measured in a static environment.
An experiment’s independent variable is the only one that can be changed.
Dependent variables are the results that are observed when changes are made to independent variables.

Any uncontrolled variables, or mediator variables, can cloud an experiment’s accuracy. So they need to be identified and eliminated in order to properly assess the experiment’s results. Differences in uncontrolled variables can also impact the relationship between independent and dependent variables.

Uncontrolled variables add the influence of unrelated factors to an experiment’s results. Correlations might be assumed, and an hypothesis might be formed where none exist. Accurate analysis then becomes difficult or impossible.

Correlation Is Not Causation Examples

It’s easy to watch correlated data change in tandem and assume that one thing causes the other. That’s because our brains are wired for cause-relation cognitive bias. We need to make sense of large amounts of incoming data, so our brain simplifies it. This process is called heuristics, and it’s often useful and accurate. But not always. An example of where heuristics goes wrong is whenever you believe that correlation implies causation.

Spurious Correlations

Spurious correlation is a mathematical relationship in which two or more events or variables are associated but not causally related, due either to coincidence or the presence of a third, unseen factor

Children and Music Lessons

After a study of human brain development, researchers concluded that kids between 4 and 6 years old who took music lessons showed evidence of boosted brain development in areas related to memory and attention. Based on this study, our biased brain might connect the dots quickly and conclude that music lessons improve brain development. But there are other variables to consider. The fact that the children took music lessons is an indicator of wealth. So they probably had access to other resources that are known to boost brain development like good nutrition.

The point of this example is that researchers can’t assume from only this data that music lessons affect brain development. Yes, there’s clearly a correlation, but there’s no actual evidence of causation. We need more data to get a true causal explanation.

Cancer and Mobile Phones

If you study a chart that shows both the number of cancer cases and the number of mobile phones, you’ll notice that both numbers went up in the last 20 years. If your brain processes this information with cause-relation cognitive bias, you might decide that mobile phones cause cancer. But that’s ridiculous. There’s no proof of that other than the fact that both data points happen to increase. A lot of other things have also increased in the past 20 years, and they can’t all cause cancer or be caused by mobile phone use.

How to Find Causation With Explainability

To find causation, we need explainability. In the era of artificial intelligence and big data analysis, this topic has become increasingly more important. AI algorithms make data-based recommendations. Sometimes, humans can’t see any reason for those recommendations except that an AI made them. In other words, they lack explainability.

Explainability in Medicine

The FDA requires transparent evidence of efficacy and safety for treatment approval — a standard that increasingly intersects with calls for explainability in AI-driven medical tools. Think about this situation for a minute. Do you want the best possible treatment for your cancer, based on an AI’s analysis of your genomes, your cancer DNA, millions of other cases and more data, even if you can’t explain how the computer’s neural network came up with that exact treatment? Or would you rather have a suboptimal treatment that you can explain the reasoning for?

Medical explainability will probably become one of the biggest topics of this century.

Correlation Goes Both Ways, Causation Goes One Way

Correlation can go both ways. We can say that mobile phone usage correlates to increased cancer risk and that cancer cases correlate to the number of mobile phones. Basically, you can swap the correlation. In causation relationships, we can say that a new marketing campaign caused an increase in sales. But saying that the increase in sales (after the campaign) caused the marketing campaign doesn’t make any sense.

Any causal statement, by definition, is one way. That’s a big clue about whether you’re dealing with correlation or causation.

Causation and the Challenge of Explainability

In economist David Card’s book, The Causal Effect of Education on Earnings, Card says that better education is correlated to higher earnings. But the most important thing he says is that if we can’t do an experiment with all our variables constant, we can’t infer causation from a correlation. We can always bring explainability to the table. But in real life, and with big enough problems, causations based on explainability are hard to prove. From a scientific viewpoint, they can’t be called anything more than a theory.

“In the absence of experimental evidence, it is very difficult to know whether the higher earnings observed for better-educated workers are caused by their higher education, or whether individuals with greater earning capacity have chosen to acquire more schooling,” Card wrote.

Does higher-earning cause higher education? Does higher education cause higher earning potential? We don’t know. However, we can make predictions. We can use this correlation to predict the earning potential of an individual based on his education. We can also predict his education based on his earnings.

More on Data Science: Ordinal Data Versus Nominal Data: What’s the Difference?

Correlation Leads to Good Predictions

It sounds like a contradiction, given the context of this article. Correlation is about analyzing static historical data sets and considering the correlations that might exist between observations and outcomes. However, predictions don’t change a system. That’s decision making. To make software development decisions, we need to understand the difference it would make in how a system evolves if you take an action or don’t take action. Decision-making requires a causal understanding of the impact of an action.

We don’t always need a full causal model to make accurate predictions. But to drive effective decisions — especially in dynamic systems — understanding causal relationships can be critical.

Frequently Asked Questions

What is the difference between correlation and causation?

Correlation means two variables change together, while causation means one variable directly causes the change in the other.

How can you determine if a relationship is causal?

To determine if a relationship is causal, conducting valid experimentation with controlled, independent and dependent variables is required.

Why is confusing correlation with causation a problem?

Confusing correlation with causation can lead to incorrect conclusions and poor decisions, especially in business, medicine and marketing.