How Should Data Impact Your Decision-Making Process?

Contrary to popular belief, no statistician can turn uncertainty into certainty for you. If you’re looking for facts or truth, you won’t find them by adding more equations to the mix; the only way is to collect so much data that you don’t need a statistician. How much is that? Bluntly put, all of it.

So, what’s a statistician for, in that case?

As a trained statistician myself, I’ve seen many people struggle to grasp what it means to do hypothesis testing in the face of uncertainty, so in this article, I’ll take a stab at clarifying those slippery stumbling blocks by taking an unusual approach: a touch of mythology!

More From Cassie KozyrkovWhat Does It Mean to ‘Work With AI’?

How Does Hypothesis Testing Work?

Let’s put ourselves in the shoes of supernatural all-knowing beings … were they to have shoes at all. If you’re a lover of the classics, perhaps you might imagine yourself among the Greek gods and goddesses, looking down upon us mortals from Mount Olympus. In a slight departure from Homer’s deities — ever fallible and bickering — let’s imagine that we know everything about past, present, and future.

There we are on Olympus, watching little mortals going about their daily business and notice — aha! — that one of them has a default action. Because we have a perfect grasp of reality, we naturally know all about statistical decision-making, including how to set up a hypothesis test. It’s fun to be all-knowing. So let’s summarize what we know so we can get back to our usual business of laughing at the ridiculousness of the human condition:

Statistical Decision-Making 101

The default action is the option that you find palatable under ignorance. It’s what you’ll do if you’re forced to make a snap decision.
The alternative action is what you’ll do if the data analysis talks you out of your default action.
The null hypothesis is a (mathematical) description of all the realities in which our default action would be a happy choice. If that sounds confusing, here’s a straightforward example.
The alternative hypothesis is a description of all the realities not covered by the null hypothesis.

A default action is the physical action/decision that you commit to making if you don’t gather any (more) evidence. When I’m performing the role of decision advisor, one of my early questions tends to be, “What would you commit to doing if you knew you had to make the decision right this moment?” If you’re curious to learn more about default actions, I have a whole article for you here.

Picking your default action is one of the most important judgments in a statistical decision process, but I’ve never seen it taught explicitly in a traditional statistics textbook. It is the pivotal concept, though, so I prefer to drag it out into the open and name it explicitly since it will shape the entire hypothesis test.

But hush, the mortal may be about to do something stupid! Let’s watch.

What Can We Learn From Data?

This little mortal’s default action is to not launch a new version of their product. Because we deities are omniscient, we know that it’s the best action to take — we happen to know that the product is expensive to launch but customers won’t like it as much as the current version. Everyone would be better off if the new version were abandoned. Unfortunately, the mortal can’t know this — the poor thing has to live with uncertainty and limited information. But we deities know all things.

We start laughing ourselves silly, “Look at this ridiculous mortal!” If the mortal is too lazy to be bothered with data analysis, they’ll execute the default action (which we happen to know is correct) and everything will be dandy with their little life. But this mortal is not lazy. This mortal is honorable and diligent … and is going to analyze data! They want to be data-driven in all things!!

The mortal goes off to collect market data, then toils and toils over the numbers. What are they going to learn from their foray into statistics?

Well, in the best possible case, they’ll learn nothing. That, in a nutshell, is the right thing to learn when you perform a hypothesis test and find no evidence to reject your null hypothesis in favor of the alternative, which would have triggered a switch away from the default action. If that explanation zoomed past you too quickly, I’ve got a gentle primer on the logic of hypothesis testing for you here.

Learning nothing would be a wonderful outcome for this little mortal, though it’s hard on the emotions — it’s awfully disappointing to spend all that effort on data analysis and come away without anything that feels like a eureka moment serenaded by celestial trumpets. But the important thing is that the poor dear will end up taking the right action. They won’t know it’s the right action (that would be, as Shakespeare might put it, more than mortal knowledge), but they’ll end up in the same happy place as the couch potato who spent their time binging Netflix. Ah, these mortals and their Sisyphean data analysis. They were going to do the right thing anyway!

What Is the Point of Statistical Analysis?

The key insight for you, dear reader, is that the poor mortal can’t possibly know this. That’s why we are doing this rather odd exercise of putting ourselves on Mount Olympus. It’s not a normal perspective for readers to take during your statistics-article-reading loo break.

What we have been snickering at so far was the absolute best case for this mortal’s data analysis: learning nothing at all and performing their default action. Now, what’s the worst possible thing this little mortal could learn from the data? Something!

Because upon learning something, they will do … something stupid. In classical statistical inference, learning something about the population means rejecting the null hypothesis, feeling ridiculous about the default action, and switching their course of action to the (incorrect) alternative. This mortal will pat themselves on the back for statistical significance and launch a bad product.

That’s so tragicomic that we’re falling out of our supernatural chairs with laughter.

Thanks to the silly mortal’s data-driven diligence and their mathematical savvy, they’ve managed to talk themselves out of doing what they should have done. If they’d been lazy, they would have been better off! Puny mortals, so brave and so good — so hilarious.

Why would a mortal end up learning something incorrectly like that? Unfortunately, randomness is random. There’s a luck of the draw element to data. Your sample might be a freak accident that leads you to the wrong conclusion. Alas, when luck’s involved, bad things can happen to good people.

Omniscient beings may have the privilege of reasoning in advance about what the right decision is, but mortals aren’t so lucky. Mortals must contend with uncertainty and incomplete information, which means they can make mistakes. Unlike supernatural, omniscient beings, people haven’t got enough information to say which actions are correct or not.

They’ll only find that out later, in hindsight, once the universe catches up with them. In the meantime, all people can do is make the best decision they can with the incomplete information they have. Sometimes that leads them off a cliff. Uncertainty is a jerk like that, which is why I do sympathize with the desire to shower data gurus with boatloads of cash in the hopes that they’ll make the uncertainty go away. But there’s a name for someone who promises you that certainty is for sale when your data is incomplete: a charlatan. Unfortunately, data charlatans are everywhere these days. Buyer beware!

More on Data-Driven ThinkingIs 'Statistical Significance' in Research Just a Strawman?

Am I Making Decisions Intelligently?

So, let’s put ourselves back in the shoes we belong in: those of puny mortals. All we have is our data set, which — when we’re doing statistics — is an incomplete snapshot of our world. We don’t have the facts that allow us to be sure we’re making the right decision. We can’t know if we’ve made a mistake or not until it’s too late. We only know what our data set looks like.

It’s always possible that, thanks to uncertainty, all our mathematical huffing and puffing talks us out of a perfectly reasonable default action. We can never be sure that we’re not making the deeply embarrassing error of toiling ourselves into a worse decision than what we would have had by spending the time with a trashy novel instead of with our data. We must remember that we are not gods and thus we haven’t got the privilege of reasoning as though we’re all-knowing.

That’s why we mortals must ask ourselves, “Am I making the decision intelligently?” instead of “Am I making the right decision?”

The mortal’s mistake — actively mathing themselves into a stupid course of action — adds insult to injury. There’s an asymmetry here that makes the mistake extra painful. It comes from the fact that they have a preferred default action in the first place. Our default action is what we fundamentally lean towards doing, even under ignorance.

The analysis would be different if there were no default action. In that case, staying lazily on the couch would not be an option — indifference between options forces you to glance at the data, but it lets you get away with a less involved approach, as I explain here.

Alas, true indifference seems rather rare in the human animal. We often enter the decision-making process with a preference for one course of action over another, which means we do have a default action that represents a happy, comfortable choice we’d need the data to talk us out of.

Instead, we’re hardly indifferent. If we were indifferent, we’d be doing this a different way — we’d simply be making a best guess based on the data. It wouldn’t matter how much data you used — just grab hold of as much data as you can afford and go with the best-looking action, statistical significance be damned.

By preferring one of the actions by default, you’re making a value judgment about what you consider to be the worst mistake you can make: stupidly leaving this cozy default action. You’re only okay with abandoning it if you get on a strong-enough cease-and-desist signal from your data. Otherwise, you’re happy to stay there. You’d appreciate avoiding the mistake of staying with a bad default action, but that situation doesn’t represent as grievous a wound to you as the other mistake you could make — leaving your comfort zone incorrectly.

Statistics Is the Science of Changing Your Mind

Statistics is the science of changing your mind, and the mechanics of its most popular methods are powered by the imbalance in your preferences about the actions that are on the table for you. The worst possible thing that you could do is talk yourself into stupidly changing your mind. And yet, because randomness is random, you could get some bad luck — it’s entirely possible that this will be exactly what happens to you. You’re a mere mortal, after all. Is there anything you can do about it? Sort of. You can’t guarantee you’ll make the right decision, but you can turn the dial on the size of the gamble you’re willing to take. It’s your best attempt at indemnifying yourself against the vagaries of chance.

In fact, that’s the main payoff to formal statistical hypothesis testing: it gives you control over the maximum up-front probability of stupidly changing your mind. It allows you to search your soul, discover your own appetite for risk, and make your decision in a way that delivers the action that best blends your data, your assumptions, and your risk preferences.

When you’re dealing with uncertainty, truly knowing is not for us mortals. You can’t make certainty out of uncertainty and you certainly can’t get it by paying a statistician to mumble some equation-filled mumbo jumbo for you. All you can know is how your data pans out in light of your assumptions. That’s what your friendly statistician helps you with. Hypothesis testing is a decision tool. What it gives you is powerful but not perfect: the ability to control your decision’s risk settings mathematically.

Thinking With DataHow Ontology and Data Go Hand-in-Hand

Statistics Offers Control, Not Certainty

To summarize our discussion: you’d have to have omniscience to know if your decision is correct. Until it’s too late, of course. With uncertainty and partial information (a sample of your population), mistakes are possible even though you’ll have done the best you can with what little you know. There’s always the possibility that you — dear little mortal — will make the terrible mistake of analyzing yourself out of a perfectly good default action. What statistics allows you to do is to control the probability of that sad event.

I may be biased in my love for statistics — I’ve been a statistician since my teens, after all — but when there are really important data-driven decisions to make, I’m deeply grateful that I’m able to have control over the risk and quality of my decision process. This is why I’m frequently baffled that decision-makers engage in the pantomime of statistics without ever availing themselves of that control panel. It defeats the entire point of all that mathematical jiujitsu!

How Should Data Analysis Impact Your Decision-Making Process?