How to Mind Goodhart’s Law and Avoid Unintended Consequences

One of the first steps to solving a problem — data science or otherwise — is determining the right metrics to gauge success. Choose wisely.

Written by Will Koehrsen
Image: Shutterstock
Image: Shutterstock
Brand Studio Logo
UPDATED BY
Matthew Urwin | May 30, 2023

To increase revenue, the manager of a customer service call center starts a new policy: Rather than being paid an hourly wage, every employee is compensated based only on the number of calls they make. After the first week, the experiment seems like a resounding success. The call center is processing twice the number of calls per day! The manager, who never bothers to listen to his employees’ conversations as long as their numbers are good, is quite pleased.

However, when the boss stops by, she insists on going to the call floor, and when she does so, both she and the manager are shocked by what they hear: The employees answer the phone, issue a series of one-word responses and slam the phone down without waiting for a good-bye. No wonder the number of completed calls has doubled! By judging performance only by the volume of calls, the manager unintentionally incentivized employees to value speed over courtesy. Unknowingly, he has fallen for the phenomenon known as Goodhart’s Law.

What Is Goodhart's Law?

Goodhart’s Law is expressed simply as: “When a measure becomes a target, it ceases to be a good measure.” In other words, when we set one specific goal, people will tend to optimize for that objective regardless of the consequences. This leads to problems when we neglect other equally important aspects of a situation.

Our call center manager thought increasing the number of calls processed was a good objective, and his employees dutifully strove to increase their numbers. However, by choosing only one metric to measure success, he motivated employees to sacrifice courtesy in the name of quantity. People respond to incentives, and our natural inclination is to maximize the standards by which we are judged.

goodhart's law

 

Examples of Goodhart’s Law in Action

Once we know Goodhart’s Law, we can recognize — and sometimes minimize —  its effect in numerous areas of our lives. In school, we have one objective: maximize our grades. This focus on one number can be detrimental to actual learning. High school seemed like one long series of memorizing content for a test, then promptly forgetting it all so I could stuff my brain full of info for the next one, without any consideration of whether I really knew the concepts. This strategy worked quite well given how we measure success in school, but I doubt it is the best approach for quality education.

More From Will KoerhsenWhen Accuracy Isn’t Enough, Use Precision and Recall

Another area in which we see the detrimental effects of Goodhart’s Law is in the academic world, where there is an emphasis on publishing as indicated by the phrase “publish or perish.” Publishing is often dependent on achieving a positive result in a study, which leads to the technique known as “p-hacking,” where researchers manipulate or subset experimental results to achieve statistical significance. Memorizing content rather than learning concepts and p-hacking are both unintended consequences that arise when we use a single number to gauge success.

 

Implications of Goodhart’s Law

From a data science perspective, Goodhart’s Law reminds us of the need for proper metrics. When we design a machine learning model or make changes to the interface of a website, we need a way to determine if our solution is effective. We’ll often use one statistic, such as mean-squared error for regression or F1 score for classification problems. If we realize there may be detrimental consequences of using only a single measure, we might think again about how we assess success. 

Much as the call center manager would be better off judging employee performance based on a combination of the number of calls handled and customer satisfaction, we can create better models by considering several factors. Instead of assessing a machine learning method only by accuracy, we might also consider interpretability, so we create understandable models.

For example, instead of using a deep neural network with incredible accuracy that we don’t fully understand, we may use a slightly less accurate random forest that can be explained with Shaply values. We see this in practice if we consider the European Union’s GDPR, which states individuals have a right to explanations of decisions that affect them.

More Stories About DataAI Makes Decisions We Don’t Understand. That’s a Problem.

 

Goodhart’s Law: Finding Balance

Although most people want to hear a single number to summarize an analysis, in most situations, we are better off reporting multiple measures (with uncertainty intervals). There are times when a single well-designed metric can encourage the behavior we want, such as in increasing savings rates for retirement, but it’s essential to keep in mind that people will try to maximize whatever measurement we choose. If we end up achieving a single goal at the expense of other, equally important factors, then our solution might not help the situation. 

One of the first steps to solving a problem — data science or otherwise — is determining the right measure to gauge success. When we want to objectively find the best solution, we should recall the concept of Goodhart’s Law and realize that rather than using a single number, the best assessment is usually a set of measurements. By choosing multiple metrics, we can design a solution without the unintended consequences that occur when we optimize for a narrow objective.

Explore Job Matches.