In his 1974 Cal Tech commencement speech, famous American physicist Richard Feynman, discussing the correct application of the scientific method, stated that “The first principle is that you must not fool yourself — and you are the easiest person to fool.” This hackneyed quote has been used and re-used over the ensuing years by all manner of behavioral researchers, social scientists, pop-psychology gurus and every species of dilettante to describe scientific self-deception that dresses up empirical conclusions in the veneer of scientific objectivity without subscribing to science’s necessary underlying rigor.
As is almost always the case, however, the real gold among the pyrite in this part of Feynman’s commencement speech came two sentences later: “After you’ve not fooled yourself, it’s easy not to fool other scientists. You just have to be honest in a conventional way after that.”
Without questioning the weirdness of the quote’s logical ordering (which, among other things, seems to preclude the entire edifice of peer review), Feynman alluded to an underappreciated, yet crucial, aspect of research. Researchers must define exactly what assumptions they’re making in the course of their work.
Specifically, explicitly stating simplifying assumptions, why they are being made, and ways in which these assumptions positively or negatively impact results are methodological practices that need more focus by researchers than they currently receive. Doing so not only better ensures the veracity of empirical results but also improves researchers’ assumption-making processes. That way, even without substantial results, the work benefits other researchers and readers by improving their own understandings of which suppositions are more or less warranted and why they might be so.
Fix Your Research!
Be Clear When You Clean (Data)
This principle is true both in data-processing decisions and modeling. Before researchers can conduct estimation, they must clean, format, and explore their data for anomalies. Researcher decisions regarding dropping outliers, mathematical transformations, or imputing missing data can all impact model results and conclusions.
For example, one recent work highlighted how, when given data and asked to address two specific economics research questions, various groups of researchers made many different choices about how to process data and define variables. Moreover, none of the final data sets researchers used to model results had the same sample sizes. As the authors of the study pointed out, researcher decisions in initial data cleaning, variable construction, and model assumptions yielded highly disparate statistical estimates and, subsequently, conclusions. Worse, none of these initial assumptions pertaining to variable selection, data pre-processing and transformation, or modeling were even documented, much less explained.
If empirical inquiry and the scientific process are a collaborative effort, then undocumented or unwitting assumptions causing different expert teams to arrive at opposite conclusions to the same problem is highly problematic for current researchers who rely on previous work to inform their own expertise. Furthermore, these problems open the entire edifice of science as a self-correcting endeavor up to question. Researchers’ failures to show their work, whether wittingly or unwittingly, not only casts doubts on their own conclusions, but these failures also undermine trust that the process of empirical inquiry will eventually produce facts about how the world works.
The Psychology of Clarity
Earlier work in psychology also confirms how unspecified assumptions can lead to high false-positive rates for statistically significant findings. There are two reasons for this. First, researchers who are doing original work have a tremendous amount of freedom in how they will carry out their projects. Second, researchers’ desires to turn up something significant in their research primes them to find certain answers. After all, who would want to dedicate months or years to investigate an original idea only to have it be a dead-end?
Although science thrives on this work, scientists do not attempt to disprove their own theories, and they are certainly not rewarded for doing so. Therein lies the problem, according to the authors. Researchers working under ambiguous assumptions and a clear incentive to produce non-trivial results have a tendency to rationalize (and report) impactful empirical results while discarding non-impactful outcomes.
The study also advocates a checklist for those conducting research. Five of these six guidelines advocate clearly stating assumptions about data collection, variable construction and failed estimates. They also advocate reporting estimates with included and excluded data and variables. By requiring authors to carry out alternative analyses, this approach goes one step further than merely stating simplifying assumptions does.
Don’t Deceive Yourself
An even more insidious form of hidden assumptions affecting outcomes can occur when researchers aren’t aware that they’re making assumptions. Needless to say, these hidden assumptions go unacknowledged by the researchers inadvertently making them.
For instance, p-values are typically used to judge whether an effect is “statistically significantly different from zero.” A p-value, succinctly, is the likelihood that outcomes at least as extreme as the observed data occurred given a specific initial condition (typically deemed a “null hypothesis”) is true. These statistics are computed as a step in an overall epistemic process called “null hypothesis significance testing.”
Frequently, though, hidden assumptions are buried in the computations of p-values. One specific data-collection or testing scenario might differ from another but produce identical data. Even wackier, a failure to account for unknown differences in the context in which the data are produced can change the p-values, even though the data on which they are based is the same.
An illustrative example is collecting coin-flip outcomes to determine if a coin is “fair,” meaning that it has a 50/50 chance of coming up heads/tails. In this case, the null hypothesis is that the probability of a heads flip is 50 percent. Collected data indicate that out of 12 flips, three came up heads. The probability that three or fewer heads came from a fair coin turns on a piece of information that is buried in the term “collected.”
There are at least two ways to collect data that can produce these outcomes. A researcher could dutifully have flipped a coin 12 times and recorded three heads and nine tails in whatever order these results occurred. Alternatively, a researcher could have decided to stop collecting data on coin flips when three heads were recorded. The process of flipping the coin did not need to stop at 12, but in this circumstance, that was the number of flips that produced three heads for the researcher. This distinction of when the data-collection process ends seems pedantic, but in fact the first stopping rule produces a different p-value than the second rule.
Not knowing the specifics of when and how data collection ended and mindlessly ascribing observed outcomes to the wrong stopping rule skews the p-value. This can lead to mistaken insight. In this example, the first process has a p-value of 7 percent, and the second has a p-value of 3 percent. More importantly, the typical threshold for rejecting a null hypothesis that the coin is fair is a p-value of 5 percent or lower.
From the same nominal experiment above, one of the results is statistically significant and one of the results is not. Worse still, not knowing which data collection process the coin-flip results came from, the researcher can choose either way of reporting the data. There is a clear incentive, given this ambiguity, to choose the statistically significant results from the latter data collection process even though it might be a mistaken assumption about how the flips were carried out.
The problem in this example is that the researcher’s non-awareness of how data stopped being recorded substantially alters the conclusions the researcher would draw. But this stopping-rule assumption is hardly ever understood or acknowledged by the researcher as an assumption they actively are making. Ignorance, in these circumstances, does not alleviate the problem that incorrect assumptions can produce false or misleading conclusions. What’s worse is that ignorance of suppositions can be even harder to ferret out than researchers simply not documenting the assumptions they explicitly make. In the words of Vincent Ludwig, “There is an even more ideal assassin — one who doesn’t KNOW he’s an assassin.”
Assumptions about data, variables, methodology, and estimation allow for necessary simplifications of the world so that we can empirically understand it. Researchers must have the ability to make simplifying assumptions when appropriate. This ability comes at the cost of keeping track of what assumptions the researchers are making and how these assumptions can lead to spurious conclusions, however. Both researchers themselves and the entities that ensure research quality have to document and check the assumptions made, where possible. Further, considering what the conclusions might have looked like should major assumptions have been made differently deserves more than just an afterthought.
Indeed, the previous linked paper by Simonsohn, et. al. included a list of six recommendations for researchers and four for academic reviewers. These six recommendations are an excellent starting point for ensuring quality empirical estimation. They also serve as a good way to organize thinking around when assumptions need to be made and when they do not.
Most of these guidelines focus on researcher disclosure. Disclosing model assumptions, estimation, and variable lists also alleviates researchers inadvertently making assumptions they might not have known they were making in the first place. The most important of these recommendations are researchers deciding on and documenting what the data stopping rule will be before data collection begins; listing all variables collected even if they are not used in the study; including results that incorporate outlying data to compare to results without outliers; and reporting all failed model estimates in addition to successful ones.
One thing the authors of the False Pscyhology paper miss in their recommendations is that many of these considerations could be accomplished by requiring researchers to pre-register their studies. This is the practice by which researchers document their data-collection plan, variables under consideration, and the models they intend to use prior to carrying out data collection and modeling. In this way, researchers might receive feedback on the explicit and implicit assumptions they are making in their research plan prior to dedicating their time and resources to carrying it out.
Even better would be if stakeholders and decision-makers who depend on research conclusions have an early and active role in research ideation. This comes in the form of asking questions of researchers, what data interested parties can supply, and what specific research techniques can be of most benefit given the context of the decisions that must be made. Deferring to expert judgment is essential to ensure appropriate statistical rigor is brought to bear.
That should not mean that experts need to be excused from adequately explaining what they are supposing, why, and what the possible drawbacks of the methods they employ are, however. Non-expert stakeholders should trust in the credentials of their research teams. But they should also not feel intimidated by data or unqualified to ask for clarification around what researchers are assuming versus what researchers can numerically verify.
Clarity Is Key
Understanding the thought process behind how research should take place, including when and why to assume things and what those assumptions should be, is vital to ensuring a reliable research process. Although results sometimes do not amount to anything substantial and mistakes often occur, greater documentation of core tenets of empirical methodology like listing all variables collected, visualizing all the data, showing results with outlying data included as well as excluded, and disclosing how data were collected help ward off empirical researchers practicing what Feynman referred to in his commencement speech as “cargo cult science.”
Unwarranted or unknown assumption-making sabotages quality statistical insights and can lead to huge miscalculations in judgment by individuals relying on that quality. Researchers should not strive to be the bumbling economist who, when asked by scientific colleagues how to escape from a deserted island, responds, “First, assume a boat....”