How to Run a Remote Data Team
Automattic — the company that owns WordPress, WooCommerce and Tumblr — has long been fully remote and preaching the gospel of distributed workforces. Turns out, the company was ahead of the curve in a way it never could have foreseen.
“This is not how I envisioned the distributed work revolution taking hold,” wrote company founder Matt Mullenweg on his blog in early March, as the novel coronavirus pandemic forced companies around the world to embrace work-from-home as much as possible.
Teams of every stripe are suddenly confronted with how to best operate in the new current reality — including data teams. There’s of course nothing innately remote-averse about data work, but common career paths into data often make the transition less intuitive than it might otherwise be.
One common on-ramp into data is via operations, according to Georgia-based Emilie Schario, an internal strategist for the data team at GitLab, which, like Automattic, is also fully remote.
In such scenarios, an ops veteran comes to realize the value of data and develops a knack for Excel. “But they grow tired of doing the same things over and over, so they learn to automate it with more technical workflows, then start moving down the more technical route into a data analyst or data engineering role,” Schario said.
That’s great in terms of career transition, but it doesn’t really do anything to expose that employee to the kind of asynchronous workflows that are often so deeply embedded within both distributed work environments and DataOps, which, broadly speaking, parallels DevOps by incorporating standard coding procedures.
So “even modern data teams have a hard time working remotely,” Schario said. “For the most part, that’s because of the path people take.”
Yanir Seroussi, a Brisbane-based data scientist at the aforementioned Automattic, agrees. “Some people come from academia or doing manual analysis in Excel, so there’s less awareness of engineering best practices,” said Seroussi, who penned a blog post in 2018 about data teams adapting those best practices, which include that all-important asynchronous component.
“It’s that classic thing of, this meeting could have been an email,” said Matthew Allen, a Boulder-based data scientist at Buffer, which has also been fully remote for years. At Buffer, “we tend to default to async as opposed to defaulting to Zoom calls,” he said. That means a lot of Jira card comments and notes on open pull requests, rather than daily teleconference standups.
Buffer and Automattic also both opt for an email replacement of sorts. The former uses Threads and the latter opts for P2, both of which are internally transparent, archivable conversation threads that more closely mimic message boards than email.
“It’s good for things that are longer than a chat message, and you can store things long-term,” Seroussi said.
For example, he recently ran some uncertainty and confidence intervals related to A/B testing improvements that his team is trying to fine-tune. He logged it all in P2, which he said helped him organize his thoughts, serve as documentation and provide a place to gather feedback from colleagues.
Longtime fully distributed companies like Buffer, GitLab and Automattic are of course inclined toward async by necessity — their employees are scattered across the globe. But the kind of benefits described above translate to more centralized remote teams too, according to the data folks with whom we spoke.
It cuts down on “mini-distractions” and constant so-called context switching, which is the last thing one wants while in the throes of programming or complex data analysis, Seroussi said.
It also boosts inclusiveness. The more communication can be made asynchronous, the more flexibility workers are afforded. People with children, for example, are then able to split their day around family obligations, Seroussi noted.
Of course, not every team can deprioritize instant communication, but data often can. “Most [work communications] are not an emergency,” Seroussi said.
The asynch-forward culture of programming also works well in terms of the nitty-gritty of data work, not just communication. GitLab can use its own collaboration medium, known as issues, while going under the hood of its code.
“It’s the idea of working in issues and then making any code changes you have, as opposed to a world where you might write a SQL query to pull some data and do additional manipulation in Excel or Google Sheets,” Schario said.
Permissioning and Access
Outsiders may think remote data teams face unique challenges related to database access and permissioning. But the central challenge in fact relates less to access and more to establishing workflows, according to all three data pros with whom we spoke.
“It’s 100 percent worth considering security, but … I would say that four office walls is a false sense of security,” Schario said. GitLab manages permissions through Okta and has adopted the Zero Trust framework — both steps that could be taken in a non-remote environment.
“It’s not about limiting what IP addresses can connect to databases,” Schario said. “It’s about making sure that people have the least privilege and you’re cycling through credentials appropriately.”
GitLab’s workflow also includes Snowflake for data warehousing, GitLab for version control (no surprise) and Slack. “Otherwise how would I send [my-coworker] GIFs all day?” Schario said.
Buffer is structured somewhat differently, but uses a similar workflow, even if some tools vary. The four-person data team has access to essentially all the company’s databases — internal radical transparency, as Allen described it. The company uses GitHub for source control management, Google Cloud for warehousing and Google Authenticator for access.
“Everything is done through the browser,” he said. “Simplifying and consolidating the tools makes it way easier to manage.”
That might all sound worry-free enough, but it doesn’t necessarily alleviate one of the most commonly cited work-from-home anxieties: But how will my boss know that I’m being productive?
For data teams in particular, that shouldn’t be a concern.
“There’s nothing like not working in the office to let your work really speak for itself,” Schario said. “When you’re being judged on your results — whether that’s analysis or data science or another role — you know people aren’t judging you based on how great you are at refilling the communal coffee pot.”
Any such anxiety highlights what might be an inefficiency within traditional American office culture, according to Allen. “Seeing somebody in the office is not the same as making sure they’re doing their work,” he said.
Project details, success measurements and individual expectations and timelines all need to be clear. They also need to be fair and achievable.
“We’re lucky as a software company and a data team,” Allen said. “If you don’t do your work, the project won’t get done. There’s no tangible results. Not every role and every company has that luxury, but a lot do.”
“You just have to really spell out expectations up front,” he added.
Still, the real challenge is often less about fears of missing short and long-term goals and more about not becoming consumed by that productivity drive.
“The hardest part of remote work is not doing your work, it’s cutting yourself off from doing too much work,” Allen said. “You’re in the zone, or there’s just one more thing, or you want to make sure that everybody else knows you’re doing what you’re supposed to do.”
“So at Buffer, we spend much more energy as a company making sure people aren’t working too much than making sure the work gets done,” he added.
For all the emphasis historically placed on asynchronicity, longtime fully remote teams also long ago realized the importance of social interaction within companies and replicating in-office personal camaraderie. Pull-request comments, it turns out, aren’t the best avenue for catching up on your weekend.
When GitLab had a data team of three, weekly sync calls always began with weekend catchup, water-cooler talk and, in general, more personal check-ins. “That’s a great way to get to know what’s going on in your team members’ lives, who they are, who their families are,” Schario said.
When Allen’s data team at Buffer meets weekly via Zoom for its hourlong check-in, team members always enable video, in order to make the connection feel more lived-in. “That’s an important piece of working remotely,” he said. “Whenever we have an internal call, we’re on video, so we can also see people’s nonverbal communication.”
Also, keep those dedicated pets and hobby Slack channels humming, urged both Schario and Allen. “You have to be intentional about creating space to build those relationships,” Schario said.
Seroussi and Allen both mentioned annual companywide meetups and data team meetups at Automattic and Buffer, respectively. Buffer’s most recent all-company meetup was a year ago in San Diego, and the data team got together in Portugal last September, Allen said. And Automattic has a data team meetup scheduled for May in Japan, and a company-wide gathering slated for September. Both of course are up in the air now, but Seroussi is staying optimistic.
“I haven’t booked any flight, but there’s no need to adjust just yet,” he said.