Airing Data Errors Doesn’t Have to Be Embarrassing
In early February, 16 data science professionals, many of them senior level, took the virtual stage to get a bit vulnerable. It was the inaugural Data Mishaps Night, where data folks were invited to discuss instructive blunders from their pasts.
“People in the data space are more willing than people in other spaces to share mistakes because so much of what we’re doing is so new,” said Caitlin Hudon, co-organizer of Data Mishaps Night and also lead data scientist at OnlineMedEd and co-organizer of the R-Ladies Austin meetup.
Such transparency encourages junior data folks.
“When I first started in warehousing, and then BI and data science, I distinctly remember the terror I had when I would report wrong data,” said Laura Ellis, co-organizer of Data Mishaps Night and analytics architect at IBM.
Sharing mistakes helps everyone fend off impostor syndrome and discover pitfalls before they happen. There’s also the commiseration factor.
Thinking back on Data Mishaps Night, plus her talk at the 2019 RStudio Conference about learning from eight years of data science mistakes, Hudon recalled how “people have learned a lot — and also just gotten a little bit of data therapy,” Hudon said.
That’s important in a realm where sometimes the shiny end product gets prominent placing, but not the many rounds of trial and error that led to it.
Hudon offered the example of someone presenting data plots at a conference: “You’ll see the end result — a good title and color scheme, reflecting the message they want their data to show. But you don’t see the dozens of Google searches for ‘How do I get this label to fit?’ ‘How do we shift the plot so the axes are flipped?’”
Realizing others were encountering the same trouble, even if it wasn’t always broadcast, was a “watershed moment” early in her career. There might have been errors, but they were the right kind of errors.
In that spirit of openness, Hudon and Ellis walked us through a few of their own slip-ups — and the lessons they offer.
Mistake #1: The A/A Test
If you’ve ever gotten a call from the dentist’s office notifying you that another patient’s cancellation opened an appointment slot, it might not have been just happenstance that the office chose you.
Years ago, Hudon worked for a company that wrote an algorithm designed to predict the person most likely to say “yes” to such outreach.
To test a new version of the algorithm, the team whipped up an A/B test. They explained to the developers who fit into each group and prepared for launch. But Hudon was hit with a surprise when reviewing the results: The control group and the treatment group were identical. Everybody got the new version.
“It was an A/A test,” she said.
Luckily, it wasn’t totally unsalvageable. Enough earlier data existed to allow for some pre-post analysis.
“But it taught me a ton about making sure that everyone understands the purpose of why they’re doing the work before they do it. ... And it taught me the importance of just communicating with people very early in the process for rolling out something big like that,” Hudon said.
Now, Hudon is working on a churn-prediction model and has some hypotheses outlined. But, true to what she learned from that early data mistake, she spent the first two weeks of the project interviewing stakeholders — 14 groups across the organization.
Mistake #2: Too Good to Be True
Ellis was working on an important initiative — big in scope and impact. It involved mapping anonymous IDs to authenticated IDs. That is, it meant anonymous users would, hopefully, be converted into identified ones. It was rolling out on a new system, which had a new method of assigning anonymous IDs.
The first step was checking the raw numbers. How many IDs, either anonymous or authenticated, were being collected?
“I wanted the numbers to be through the roof,” Ellis (left) said. “And guess what? They were.” Loads of data were streaming in.
Being in data can feel like being a meteorologist, Ellis said. When the 10-day forecast is nothing but 68 degrees and clear, everybody’s thrilled to hear from you. In this instance, the news was all sunshine.
Ellis was thrilled, but also a bit skeptical. Were the numbers too good to be true? She sampled some 100 people to make sure. Everything checked out.
But when Ellis went to check the conversion metrics half an hour later, the numbers were less party-starting. They were, in fact, in the tank.
“It was unbelievable that these numbers were so high, and it was unbelievable that these [other] numbers were so low,” she said.
Turns out, there had been a bug that duplicated the anonymous IDs. “If you logged in five times, you were five people,” Ellis explained.
“Even if you’re under a lot of pressure in the moment to get the numbers out, if it feels too good to be true, you should really just take a step back and question it one last time,” she added.
Mistake #3: Understand the Source
In college admissions, there’s a term called yield. It’s the percentage of students who actually enroll in a college or university after being admitted. It’s considered a marker of prestige — a very high yield means prospective students are eager to become actual students. But there are obvious logistical components related to yield as well. An institution needs to have a good sense of how many students will commit in order to manage a mess of considerations, from financial aid to housing and beyond.
Hudon (left) built an enrollment-prediction model to tackle this exact problem for a client. It was her first such model, but it initially looked great.
“I thought I was just brilliant, because I got a really good model, very accurate. ‘OK, this is looking good,’” Hudon recalled.
But during a tech review with the company CEO, they spotted an issue. The model included a variable called “campus visit.” That would seem to be pretty predictive. If someone visited the school after having been accepted, that’s a clear sign of strong interest, no?
But the variable had no date. A post-acceptance visit was weighted just as much as, say, an initial visit to feel out the school.
The mistake was hardly catastrophic, and Hudon now recalls it with a chuckle. But it was a good lesson in getting acquainted with new domains and avoiding assumptions about data sources.
“That taught me a lot about the importance of understanding your data, how it’s generated and where it’s coming from,” Hudon said.
Cultures of Transparency
When Hudon shared some of her mess-ups during the 2019 RStudio Conference, and when she and Ellis organized the opportunity for data scientists to do the same earlier this year, both events were designed to be safe environments. That raises the question of how much self-transparency is good in the workplace, where the vibes might not always be so supportive.
“It’s a tricky dance,” Ellis said. Broadcasting every marginal error might just make higher-ups worry unnecessarily if your numbers are off, but you also don’t want a culture of hiding mistakes.
She recalled one team member who was terrified of slipping up. Only once the employee felt fully reassured he could fail safely could they really progress.
“The results would be wrong, but we couldn’t dig into it as a team,” Ellis said. “Of course the results are wrong; it’s the first time we’re looking at a complex scenario.”
“So it’s not just in community sharing, but it’s within your team, especially the junior employees,” she added. “We have to make sure they know it’s OK to make mistakes.”
Being Mindful About How — and Who
That has to be handled with care, especially in a professional environment. If, say, only a small handful of volunteers are being transparent about their errors, it creates an imbalance.
Ellis takes part in regular knowledge-sharing sessions. It keeps everyone attuned to the latest features and developments, but it’s also meant to provide a space where everyone can share their experiences in the process — good or bad.
For a time, everyone was only sharing the good, so there was an active effort to counteract that with more mistake-sharing. But the three folks looking to share their mistakes were all senior technical women.
“We had to take a step back, because we’re not going to have three technical women leaders share their whoopsie moments, and then the rest, predominantly men, go after and talk about their successes,” Ellis said.
“You have to be careful about creating a safe space to share mistakes with the right context,” she added. “You have to kind of create the right culture and venue for that.”