This past October, the Bank of Sweden awarded its yearly Prize in Economic Sciences in Memory of Alfred Nobel to three U.S. economists for their contributions to labor economics and econometrics. Joshua Angrist, Guido Imbens and David Card are admittedly not household names for most data practitioners. But the most important theme of these three economists’ work (and that of their frequent collaborator, Alan Krueger, who surely would have shared in the prize but for his tragic suicide in 2019) should be familiar. They have shown that statistical analyses can produce estimates of causal explanations behind patterns in data.
These four researchers essentially started the “credibility revolution” in econometrics. The movement’s major focus was how best to design empirical research to get at causal explanations using statistical evidence. Their work on research design not only provided insight into why various things may occur but also suggests an alternative to the current trend of collecting huge amounts of disparate data for training algorithms to sort out factors responsible for data regularities.
This alternative provides a particularly viable method for addressing business-decision-relevant questions relative to the costs of getting big-data infrastructure in place. Will a change in 401k vendors result in changes in workforce retirement savings allocations? Which website design leads to more online purchases? What types of non-monetary benefits best help alleviate worker turnover?
These and many other questions don’t require the use of data science methods for empirical understanding. Rather, well-designed research based on thinking experimentally can yield meaningful insights at a lower cost.
The Credibility Revolution and Experimental Design
Data Analysis Gets Expensive
Many companies simply underestimate the costs of establishing or expanding their data science and AI capabilities. Their common mistake is assuming that data collected and disseminated for financial or other compliance-reporting processes can simply be expanded to generate data sets large enough to train AI-based algorithms for predictive inference.
This type of cross-pollination is usually impossible. Data collected for compliance or reporting purposes are typically custom-built for the department or line of business that needed them. Data collected for AI-caliber analysis, on the other hand, requires a general hierarchy of necessary components for regular, enterprise-wide or even extra-enterprise data collection, processing and dissemination. Not only must data be collected and structured across disparate parts of an organization, but you also must collect, update and assemble data from outside the organization (e.g., census figures, unemployment statistics, or other federal, state, or local metrics) to fit company-specific data architecture.
Furthermore, these data must be designed specifically for use in predictive inference rather than for end-use in a dashboard or Excel spreadsheet. All of this is to say that appropriately addressing each part of this hierarchy of AI needs presents its own challenges. Even the most data-savvy companies can find it difficult to efficiently surmount these challenges at cost.
Furthermore, even if companies succeed in streamlining and scaling their data-collection processes enough to get to regularized estimation and prediction using big data, they might still find themselves limited by the quality of the initial methods by which the data were assembled. Poor data quality cannot be overcome by collecting more poor-quality data. Issues of biased sampling, poorly defined metrics, and incorrectly or inappropriately applied methods are all issues that do not go away merely by virtue of a functioning data-production pipeline.
Experimentation Without Big Data
Our credibility revolutionaries can help provide an alternative to these large-scale, difficult-to-implement data processes. Although getting the relevant data-collection architecture in place to support big-data insight in the short-run (or at all) might not be feasible, enterprises shouldn’t write off their abilities to collect empirical evidence and gain credible insights. In fact, a big reason that this year’s Nobelists were awarded their prizes was their work on identifying how disruptive, random events can mimic the random, clinical assignment of patients to treatment and control groups.
Of course, a random, disruptive event cannot be designed in advance. But, due to the idiosyncratic nature of these events (examples of which include pandemics, wars, terrorist attacks or natural catastrophes), the disparate impacts felt by groups of people due to their exposure to the event functions as a society-wide random-assignment mechanism exemplative of clinical trials. The key to getting at causal questions is comparing how similar groups of people behaved before and after their exposure to the event. This approach turns on the assumption that, but for the event’s occurrence, these quasi-treatment and quasi-control groups of people would have had similar outcomes relative to before the disruption.
To put these methods in context, a classic problem in social science research is the interaction between the number of police officers assigned to a specific area and changes in crime rates there. The difficulty in teasing out whether police presence has an impact on crime results because assigning more police officers to an area could occur in response to elevated crime rates in that sector. In this case, if a researcher simply ran a statistical analysis of police increases on crime rates, the correlation between the two variables would be positive.
The causal story is obfuscated rather than clarified, however, by the statistical analysis. The positive correlation suggests that the increase in police caused an increase, rather than a decrease, in crime. This is not only misleading but also answers the wrong question, namely, what happened after police were reassigned? The appropriate question to ask for estimating a direct effect is what would have happened to crime rates had police not been reassigned.
Two economists sought to answer the latter question by treating a natural, disruptive event as a proxy for the random assignment of more police patrols. Jonathan Klick and Alex Tabarrok made the case that changes in the Homeland Security Advisory System (HSAS), the color-coded representation schematic used to inform public agencies of the threat of terrorist attacks, represented just such a disruptive event to police presence. They estimated that the increase in police presence due to increases in the HSAS threat level — an increase, crucially, that was not related to local crime rates — nevertheless caused crime rates to fall. By harnessing these random increases and decreases in terrorist-attack probabilities (random in the sense that no one could predict when they would occur), Klick and Tabarrok were able to break through the circular logic of the police and crime-rate “chicken-egg” problem to show the direct effect that increases in police patrols caused crime rates in affected areas to drop.
David Card, one of this past year’s prize winners, provided another example of using a “natural experiment” to shed light on economic outcomes. Card investigated wage and unemployment effects after the 1980 Mariel boatlift, which began with Fidel Castro’s announcement on April 20, 1980 that any Cubans that wanted to leave the country could do so by boat at the Cuban port of Mariel. Over 120,000 Cubans emigrated, 60,000 of whom settled in Miami from May to September 1980. This sudden influx of mostly low-skilled young men, according to Card’s argument, represented a natural experiment because, although the boatlift was not in any way associated with labor market conditions in Miami, it nevertheless should have had huge impacts on Miami’s workforce.
Economic theory suggests that a sudden increase in labor supply without an offsetting increase in labor demand should result in lowered wages and increased unemployment as new workers struggle to fill a diminishing number of existing jobs. These effects would have been particularly acute among jobs employing low-skilled young men and, especially, non-Mariel Cuban workers within the same demographic categories as the Mariel emigres.
Card compared wage and unemployment rates in Miami from 1979 to 1985. He also compared these rates with identical demographic groups of workers in a selection of other American cities that did not experience labor-market effects of the refugee influx. Card’s results were shocking in that wage and unemployment trends in Miami were minimally different than those of other cities before and after the Mariel refugees’ arrivals. Card proposed several potential reasons for this lack of a strong effect. The enduring impact of Card’s work, however, is that it pioneered thinking about how to use naturally occurring events in a quasi-experimental fashion to investigate potential effects’ causes.
Good Experimental Protocols
Whether in a political-refugee situation, a change in compulsory school-attendance ages, or the close geographic proximity of labor markets with different minimum wage laws, identifying sources of randomness that can roughly mimic random assignment in clinical studies represents an excellent opportunity for quantifying purportedly causal effects.
Natural experimental methods come with a price, however. They rely on the assumption that the effects being estimated have been effectively randomized via the natural process within populations. The explicit randomization conducted prior to clinical trials is replaced with this assumption, and there is no statistical method by which to ensure it is true.
In the case of policing, this assumption takes the form of supposing that increases in the color-coded HSAS are not associated with increases in metropolitan crime rates in the District of Columbia. The assumption in the Mariel boatlift case turns on laborers in Miami prior to 1980 having no association with the Mariel emigres. Both examples illustrate that these assumptions for determining causal effects can and should be debated.
For instance, what if DC’s municipal police have close relationships with the U.S. Marshals, Department of Homeland Security personnel, or members of the armed services at the Pentagon? What if Miami’s Cuban population shares extended familial or financial ties with political refugees attempting to exit Cuba and find work? The extent to which the natural assignment of groups to conditions is not random will be reflected in an increased chance that statistical estimates are confounded and will undermine the proposed causal effect.
Building Your Own Experiments
Still, a clear opportunity exists for businesses seeking to quantify the impacts of different actions they can take. The natural-randomization assumptions that quasi-experimental methods turn on can be sidestepped if companies can randomly experiment with their own business processes. This involves randomizing different groups of individuals to different treatments and controls.
For instance, to estimate the revenue impacts from different ways of marketing a new sale, say, a firm might randomly assign different advertising formats among known customers, between markets, or between store locations within markets. You no longer must assume that some event randomized the assignment because the firm has actively designed its own experiment by allocating specific treatment and control conditions as if it were the clinician in a medical trial.
This is not quite the gold standard of an actual clinical trial because customers in markets are not groups in laboratory settings where mitigating factors can be closely controlled. Customers can also be lost to other firms or markets and the individuals assigned to treatments and controls might never be known, such as when both treatment and control groups comprise geographic areas or store locations.
Still, reliable conclusions are on offer due to the control industries and businesses have over their own processes. So, while not quite a random-control trial, small-scale experimentation involving random assignment can yield deeper, more reliable insights than even natural experimental methods can.
Don’t Rely on Data Alone
Rather than increasing investment into new data infrastructure or attempting to kludge together disparate data systems whose incompatible designs serve different corporate purposes, it can pay (in terms of dollars and labor-hours spent, as well as potential systemic failures) to think closely about how to design tests of underlying processes to tease out potential causal effects. Whether that design be a strong assumption that a sudden shift in policy or process has provided a quasi-experiment for estimating outcomes or a randomized experiment designed in-house, the potential benefits of better estimation quality and a causal explanation for why something occurred represent can outweigh the drawbacks of attempting to institute AI-based solutions when they are either too costly or simply impossible to implement.
More attention paid to potential causes of effects, how to design experiments to identify them, and the potential benefits and drawbacks to business processes from implementing these small-scale experiments can yield valuable insights at cost. Knowing what caused what naturally sets up thinking about what processes are more or less effective. Most big-data, algorithmic estimates are “black box” and solely predictive. They cannot answer questions about how and why things arise.
So, thinking in terms of experiments, whether these be randomly shutting off a marketing channel’s spending, assigning different hours-tracking systems to a random subset of workplaces, or comparing how workers take sick leave across states with different leave policies, can yield both relevant insights for decision-makers that also provide them with reasons for their decisions.
Besides, AI hasn’t even won a Nobel prize (yet).