What to Do When You’re an Organization’s First Data Scientist
Ever heard the one about the astrophysicist who decided to become a data scientist? A few years ago, that might’ve sounded like the setup to a bad joke. But that’s the path charted by Brett Salmon, who went from searching for distant galaxies by analyzing images from the Hubble Space Telescope and the Mauna Kea Observatories, to building models that help restaurants make better sense of customer patterns at Los Angeles startup Bridg.
His shift is actually an increasingly common one, as more and more stargazers have realized that their astronomical skill sets — building large catalogs of data and performing statistical analysis on them — translate well to more terrestrial, if slightly less glamorous, environments.
“Ninety-nine percent of either job is sitting at a computer and trying to draw insights from data,” Salmon told Built In. “Galaxies happen to be a lot more romantic, but it’s effectively a lot of similar work.”
In fact, it didn’t take long to find an “awesome” corollary between Salmon’s old cosmic work and his current restaurant and retail focus. He was working on one of his first projects at Bridg, where he was the company’s very first data scientist, trying to discover patterns in how diners were visiting a given restaurant. The patterns appeared abnormal, but were they actually clumping together in meaningful batches?
He devised a method similar to the one used to make sense of star clusters.
“If you see a random bunch of stars, you need a method that tells you whether it’s an actual significant physical star cluster that’s clumped together, or if it’s just consistent with noise,” Salmon (left) said. He simply trained his similar method to understand “the clumpiness of how people behave.”
Salmon added: “I guess you could say that some people behave kind of star-like, and gravitate in a way that’s like physics.”
Salmon’s intergalactic CV prepared him well for the early challenge, but grabbing the mantle as an organization’s first data scientist surely doesn’t require a stop at NASA, right? Maybe not, but, as with all trailblazers, anyone who’s the first to tackle a role at a company will definitely face some unique challenges. Salmon walked us through what to expect on a maiden data-science voyage.
Make Sure the Engineering Bedrock Is Firm
It can go one of two very different ways for a company’s first data scientist. The organization either has its data pipeline built and running when you walk in, or it does not. And even though companies tend to time their hiring better now than they did, say, five years ago, many still miss the mark — which could mean thorny pastures.
If a data scientist is brought on before their time, they’ll likely end up doing a lot of engineering work, “just to get the data to a place where it can actually provide insights,” he said.
That wasn’t Salmon’s experience at Bridg, which carefully plotted the hiring timeline and had the larger data groundwork in place, he said. “But that’s definitely not to say that I don’t do any work similar to data engineering, because there’s still always data that needs restructuring,” he added.
That includes adjustments like feature engineering, or molding raw data into feature vectors that can actually be understood by a machine-learning algorithm.
“The data you need may be in all kinds of different formats and spread all over the place,” Salmon said. “So one of the first things we did was aggregate things together and put [data] into features and formats that are informative to an actual model.”
“The long part of a data science project isn’t the building and machine learning part — it’s putting everything into production,” he said. “So as the first data scientist, there’s a lot of pain in getting that crank to turn. But once that pipeline is made, it makes it easier for every subsequent project and any subsequent data scientist.”
A scientist who’s brought on prematurely could end up doing an inordinate amount of the comparatively menial data analysis aspect of the job. If someone’s not already dedicated to handling data ingestion or cleaning and aggregating, that could fall fully onto the new data scientist’s plate.
Still, things could be much worse.
Salmon recalls a colleague who was hired as a data scientist but was in practice more akin to glorified IT support, being summoned by coworkers to fix random computer problems. “He quickly left that job,” Salmon said.
In a perfect world, all companies would know better. In the real world, however, its incumbent on prospective employees to gather a lay of the land before signing on to become an organization’s first.
“‘Data scientist’ is a very flashy title that can definitely be thrown around [carelessly],” Salmon said. “A lot of it is on the interviewee to know or vet out which companies are ready or not, data-wise — because the company may not know.”
Get Access and Hammer Out a First Project
One early barrier to expect for a debut data scientist is logistical: data access.
“Especially if a company deals with sensitive data, you need to go through the right channels to have appropriate access,” Salmon said. “And if you’re the first data scientist of an organization, that pathway hasn’t been forged yet.”
Getting the necessary permissions can be a bit of a struggle, since the company likely won’t be eager to hand over administration rights to someone they’re still onboarding. Be patient and understand that getting into various databases or rewriting access takes time, and use the opportunity to better understand neighboring departments.
“A lot of the first part of the job is just talking to a lot of different people in departments, trying to figure out not only things like access, but also trying to understand the pain points of the business,” he said.
That interaction can also answer one of the biggest questions facing an organization’s first data scientist: What will their first project be?
For some, that’s already been mapped out by the C-suite when they walk in the door. But not always.
“My leadership team gave me a lot of space to come up with my own first projects,” Salmon said. “I’ve worked in tandem with them to figure out the lucrative first projects that we can make and also the projects that would probably take longer or [require] more resources but are the Holy Grail of this industry.”
Data science is often likened to drinking from a firehose, and that feeling might be even more pronounced as an organization’s first. That can make it difficult to gauge the proper timeline for a first project. Add in internal expectations, both real and perceived, and things get even trickier.
“When I joined as Bridg’s first data scientist, I felt my own internal pressure to deliver something fast,” Salmon said. “But you have to realize [your first project] doesn’t necessarily have to be some big neural-net AI production-phase project for you to feel like you’re contributing.”
Keep an eye on ambitious opportunities, but, at the same time, poke around the company’s data streams and look for opportunities where your previous experience can provide small improvements. “That way you’re not waiting seven months or so before you’re giving anything good,” he said.
Know When to Watch for Drift
A company that’s just welcomed its first data scientist hasn’t yet had to contend with the challenges of ongoing machine-learning maintenance. But that’ll change. Once that first model goes into production, not only does the data team have to be vigilant against data drift, they have to be able to identify when the task has become so big that it requires reinforcements.
Even though Salmon is still in the midst of the product cycle and getting ready to deploy, it’s something he monitors closely.
“Once products are in production, you need to start asking whether it’s time to hire more,” he said. “You can’t be juggling too many spinning plates. If you’re trying to monitor existing products yet at the same time build new ones, you need to start having a team to handle that growth.”
Bridg hasn’t reached that point yet, but Salmon knows it’s a matter of time. “We’re still very much actively building and making decisions about which models are most interesting to customers and most feasible to produce and automate,” he said.
“But once things are really up and running, you have to say, ‘OK, we’re ready to bring on somebody else.’”