Data Visualization Charting Lessons From COVID-19

In many ways, data visualization has been instrumental to how we’re processing COVID-19. The Washington Post’s animated-dots simulation became the paper’s most-viewed article ever. Data-visualization journalist John Burn Murdhoch has justifiably seen his number of social media followers balloon thanks to his charts in the Financial Times. And the very concept of “flattening the curve” has become vital in the battle against the pandemic — even if data professionals took exception with some representations’ failure to accurately communicate hospital capacities.

It’s also a lesson unfolding in real time.

“This is going to be an excellent case study for a lot of future graduate students and undergrads of things done well and things done poorly,” said Amanda Makulec, a senior data visualization lead at Excella and operations director of the Data Visualization Society.

Why is Data Visualization Important?

Data visualization is the act of taking a plethora of information (data) and putting it into a visual context (like a graph or chart) to make it easier for humans to find patterns and understand the underlying trends. In essence, data visualization breaks down incomprehensible datasets into easy-to-understand visual aids.

Professionals caution that those lessons won’t really be known until after the fact. “We won’t be able to properly evaluate all this until we look back and understand how we can build standards,” said Elijah Meeks, a data visualization engineer at Apple and the executive director of the Data Visualization Society. He likened discussion about the current moment to the mass criticism that sometimes ignites over a Wikipedia article about a still-unfolding event.

Still, the pandemic has surfaced some important conversations that have been ongoing within data visualization for some time, from how to visualize uncertainty, to finding ways to humanize data and figuring out when it’s better to say something, rather than display it.

Illustrating Uncertainty

Anyone who’s followed the news knows that the coronavirus data we have is incomplete and flawed. There are testing discrepancies, variances in methods of data collection, a likelihood of missed diagnoses. It’s these sorts of massive data variables that led FiveThirtyEight to not even build a COVID-19 model. Real-world data is always messy, but this is next-level uncertainty. Uncertainty, it turns out, is not easy to visualize, even though doing so is manifestly important. (To be clear, uncertainty refers here to the range of potential outcomes that could exist beyond a data set’s estimate or a predictive model’s forecast, rather than the quality of data. But uncertainty would of course be further complicated by flawed data.)

Generally speaking, we don’t see much visualization of uncertainty. There are features that data designers can use to express uncertainty, such as error bars on graphs and charts and probability density plots. Research by the Midwest Uncertainty Collective, which studies methods of communicating uncertainty in visualizations, has shown that a technique called quantile dotplots has strong potential for effectively communicating uncertainty. But the default is often to shy away from such tools.

“Uncertainty symbology is very difficult, even inside highly literate, data-driven organizations.”

There are several reasons for this, according to Jessica Hullman, a co-director of MUC who’s extensively studied the question. First, not visualizing uncertainty has become the perceived norm. By bucking that trend, data-viz professionals risk having their visualizations perceived as hedged bets. They “don’t want to be perceived as not confident in what they’re showing,” Hullman said.

Also, it’s difficult to calculate and show uncertainty. “Even if someone knows how to make a decent visualization, to actually go and calculate intervals or probability distributions is not always easy,” Hullman said.

But most importantly, designers fear, not unreasonably, that their audience will be confused. “For a lot of people, looking at graphs is hard enough,” she said. “So if you have to explain things like probability, you’re going to lose readers on it,” she said.

Even audiences who are relatively well-versed in data viz may struggle with comprehending uncertainty methods: “Uncertainty symbology is very difficult, even inside highly literate, data-driven organizations,” Meeks said.

Uncertainty visualization has become a rich subfield of research within data visualization study in recent years, Meeks said. (He cites the aforementioned MUC among those doing strong work.) But that hasn’t yet translated into meaningful adoption. When uncertainty is visualized, it’s often done in a way that implies the designer hasn’t fully considered readability, Meeks said. He pointed to leading information designer Alberto Cairo’s critique of the confusing “cone of uncertainty” in hurricane maps.

More often, though, it’s altogether absent. “When it comes to actual products that are out in the wild, you hardly see it,” he said.

The “cone of uncertainty” illustrating the projected path of Hurricane Irene. | Image: Wikimedia Commons / NOAA

Notices of Intent

The tension between visualization readability and communicating uncertainty is long-standing in data viz, but COVID-19 has cast it into relief like perhaps never before. In a recent webinar hosted by Northwestern University, Hullman and her fellow MUC co-director, Matthew Kay, explored a number of pandemic-related visualizations along with the question of uncertainty. One notable way of communicating uncertainty, Kay noted, is disclaimers.

Kay singled out this visualization. Above the scale chart and the interactive toolbars reads a note:

“Disclaimer: This simulation is for research and educational purposes only and is not intended to be a tool for decision-making. There are many uncertainties and debates about the details of COVID-19 infection and transmission and there are many limitations to this simple model.”

The 3Blue1Brown video below, which illustrates how various isolation methods could impact predictions, is another good example, Kay noted. “You can’t rely on a reader to set some parameters or rerun a simulation a couple of times,” he said in the webinar. “You really need to be upfront about those assumptions and try to guide people through their implications.”

3Blue1Brown

Another example? In a recent Tableau post, Amanda Makulec pointed to the USA COVID-19 Live Chart, which includes a disclaimer that outlines how policymakers and doctors might use the visualization differently.

Whereas footnotes offer a place for contextual information on how to better interpret a visualization, disclaimers can be a broader cards-on-the-table opportunity, according to Makulec. “The word ‘disclaimer’ is really a bigger conversation about how good are we at disclosing our expertise and our intention,” she told Built In.

Makulec argued in the Tableau post that any non-epidemiologist building COVID-19 dashboards should consider including a disclaimer in which they state, along with expertise and intentions, any data uncertainty, clarifications on how not to use the data (“namely for medical decision-making”) and helpful links. She also included a modifiable template.

“Are you just a concerned citizen who has data visualization skills and built something to enable your own understanding? Be transparent,” she told Built In. “The more we do that, the more we can foster transparency about how and why we’re creating different charts and graphs — and who they are designed for.”

Along with disclaimers, Kay in his webinar also cited animation as a potential tool for communicating uncertainty. He mentioned the famous, aforementioned Washington Post bouncing dots visualization plus a mock-up he made using so-called hypothetical outcomes plots (HOPs). (See the mock-up here.) In HOPs, a chart or graph line wiggles or shifts to visualize uncertainty in more intuitive fashion than, say, static (and potentially confusing) error bars. The MU Collective have released a couple of studies that point to HOPs’ strong potential for communicating uncertainty in visualizations.

Finding that sweet spot of visualization readability while conveying the full breadth of information, context and variance is an eternal challenge. That’s why a robust critique process is important, even in less fraught times, Makulec told Built In.

“Shouldn’t we be asking these questions even if we’re just visualizing data on, like, America’s favorite pizza toppings?” he asked. “Feedback is always good.”

Makulec also pointed to a DataMixed article from last year by Avoiding Data Pitfalls author Ben Jones, in which he offers advice for giving and soliciting public and private feedback, as a helpful resource.

The People Behind the Data Plot

Like uncertainty visualization, the challenge of humanizing data is nothing new, but COVID-19 has pushed it to the forefront. One promising technique of illustrating the people behind the data, Kay illustrated in the webinar, is the aforementioned dotplot.

Kay built a mock-up using hypothetical data in which the “curve” of patients who need a bed is composed of dots, with each dot representing 10,000 people. The simple shift away from the more abstract linear curve goes a long way. “The idea is to help you reason about the fact this is affecting real people,” he explained.

“This is maybe one of the first big cases in a while where people feel like lives near them are threatened, so it’s about really making it clear this could hit your community, by showing the data in a way that makes people think, this is real people,” Hullman said.

Another way to personalize the data is to localize it — as with interactive visualizations that present data specific to a reader’s location, Hullman said. Data quality concerns may have precluded designers from making many such visualizations, but it remains an attractive option for situations where that’s not the case. “I think that could’ve been useful here earlier on,” she said.

A similar thought struck Makulec. “Information that puts me in the center of the context has been very helpful,” she said. She’s focusing more on the news around her D.C. home than forever refreshing macro totals.

“Shouldn’t we be asking these questions even if we’re just visualizing data on, like, America’s favorite pizza toppings?”

“The case counts daily at this point in my life don’t add a lot of value because they don’t change what I’m doing day to day,” Makulec said. “I try to pay more attention to recommendations coming from public-health officials ... rather than focusing on these big aggregate numbers that I know are really fuzzy.”

When to Sit It Out

Some have taken the fuzzy numbers and run, however. For Meeks, that spotlights a sort of perfect storm that has arisen during the pandemic: a combination of sophisticated and accessible data products, health-agency data within easy reach, and a relative rise in data literacy have allowed for the creation and distribution of less-than-rigorous visualizations.

“There’s a larger tension around the maturation of data visualization and how now it’s being used, very unabashedly, incorrectly,” Meeks said. “Sometimes that’s because of malicious intent, and oftentimes it’s because the people doing it have very easy access to data. They might have a skill set that makes them comfortable dealing with analytical data, but no domain expertise.”

Such democratization and accessibility is “wonderful” under normal circumstances, but in extraordinary times, it’s important to ask “whether it’s morally justified to produce things that might affect people’s decisions if you yourself are a dilettante in the domain,” Meeks said.

Makulec stressed the point too. Don’t treat fraught situations as training grounds. “There are a lot of other opportunities for learning and for prototyping and for sharing and soliciting and getting feedback without the same level of risk,” she said.

Indeed, it’s difficult to imagine another scenario with such an extreme confluence of incomplete data and high stakes, but thinking about whether a given situation calls for verbal over visual communication is another consideration. In short: use your words. “I think words and pictures are important, more so than charts,” said Makulec, noting that a chart, if screenshotted and tweeted, immediately loses any clarifying body copy, however helpful.

That question of misinterpretation has been on Makulec’s mind a lot lately in terms of data visualization journalism. Who bears the responsibility if visualizations are misinterpreted? The data collector, the data designer, the journalist who contextualized the chart or graph, or the reader?

There’s always some responsibility on the reader, but in a situation where audiences are stressed and distracted, “the onus and the responsibility is even more so on the data visualization designer and journalist,” she said.

That means knowing when to say no — even during a time when, as Meeks pointed out, data professionals working in a portfolio-driven industry might feel compelled to produce a standout chart. “If the risk of misinterpretation is high, we should put a higher bar on what we choose to put into the public sphere,” Makulec said.

Coronavirus Data Visualization Charts Are Everywhere. But Are They Good?

Why is Data Visualization Important?

Illustrating Uncertainty

Notices of Intent

The People Behind the Data Plot

When to Sit It Out

Recent Data Science Articles