A few years ago, Michael Kaminsky went to HR to try to convince his employer to create a new role. A key step in the business intelligence process was routinely getting lost in the cracks and they needed to formalize ownership.
Here was the problem: The analytics team that Kaminsky led would go to the data engineering side needing support to build pipelines for whatever BI question they happened to be working on. But data engineering had its own (very different) priorities. It was focused on larger infrastructure concerns around ingestion and warehousing. When Kaminsky would approach data engineering to, say, build out a better customer lifetime value model, he’d hit a wall of resistance.
“They would be like, ‘I do not want to do that. I will not do that.’ And they wouldn’t do it,” he said.
Kaminsky suspected his experience wasn’t unique. “I think there are a lot of forums where that’s true,” he said. “There’s a data engineering team in charge of a Kafka pipeline, moving production data from point A to point B, and they are not interested and do not want to work on anything that involves business logic.”
Kaminsky got his wish. The title of the new role — analytics engineer — was not yet widely in use, but it was the correct choice. It spoke to the software development know-how required by the job while underscoring that its focus was analysis, not streaming data platforms.
What Is Analytics Engineering?
In the broader world, the role continues to evolve and varies a bit by organization, but, generally speaking, analytics engineers are responsible for transforming data into clean, accurate data sets to be used by data analysts or other end users. That also means applying software engineering best practices to the code used to build and feed those data sets. Just as developers manage source code changes through a centralized repository, so too do analytics engineers for analytics code.
Today, analytics engineering is very much in wide use. (Kaminsky played a part, helping concretize the concept in a popular 2019 blog post.) The role has proliferated in many companies and generated significant chatter in data circles, where it’s been championed as a transformative “superpower” and fretted over as a potential prelude to overemphasizing technical skills.
At the same time, resources haven’t quite kept pace with the role’s growth. One option is the self-guided course offered by dbt, one of the foundational tools in analytics engineering. (Dbt is used to transform data. It’s a command-line text editor that compiles code into SQL then executes the code against the data warehouse.) There’s also the Jaffle shop project and Udemy’s “from scratch” dbt course. Still, the ecosystem is hardly robust compared to that for coding or data science.
That led Kaminsky and Claire Carroll, who also helped define and mainstream the role while working as a community manager and analytics engineer at dbt Labs, to launch Analytics Engineering Club, a new eight-week upskilling program to teach the required skill set.
Here’s what their experience in shaping analytics engineering and building one of its first major education resources can tell people who are curious about making the transition.
Analytics Looks More and More Like Software Engineering
As mentioned, analytics engineering is still taking shape. For instance, the Holistics Blog pointed out that Carroll’s vision differs slightly from how Spotify implemented the role. But a commonality across the board has to do with tooling. Along with dbt, warehousing platforms like Snowflake and BigQuery, ingestion tools like Fivetran and Stitch, and BI platforms such as Looker, Mode and Chartio came to penetrate the “modern data stack.” Together, these shifted the nature of data functions in many organizations.
Dbt democratized data transformation for analysts, who no longer had to rely on the sometimes surly, Kafka-focused data engineers of Kaminsky’s example to spearhead pipelines. Being a command line tool, it also further emphasized coding.
As analytics starts to look more like software engineering, the need to understand established engineering guidelines becomes more important. Carroll noted in a May post that her team required people to check their SQL queries into git repositories to manage code changes, noting “there’s a lot of ground between writing in a web [integrated development environment] and being proficient at version control.”
“If you’re building production software and you don’t have a software engineering background, there are a lot of ways to shoot yourself in the foot … because you’re not using best practices that most software engineers are familiar with,” said Kaminsky.
That’s why it’s important for those interested in analytics engineering to eventually get comfortable with things like version control and command line navigation. Those are also what AEC students will focus on first — “basic things that a lot of software engineers take for granted that [newcomers] find incredibly intimidating,” Kaminsky said.
Other software best practices that analytics engineers ultimately need to know include unit testing and integration testing. Just as software engineers test small sections, or units, of code, analytics engineers need to ensure that individual data transformations behave as expected. And like their software counterparts who check that integrations of individual components work together as a whole, analytics engineers will ensure that any new tables or transformations play nice with the rest of a data pipeline.
These, along with streamlined documentation and monitoring, are the best-practice “guardrails” that prevent updates from derailing pipelines and, if something should go wrong, allow teams to retreat to older, safer versions, as Nubank analytics engineers Ariane Hoffenberg and João Pedro wrote in a recent post.
Expect Some Limited Access to Resources and Tech
Access can be a challenge for those trying to transition into analytics engineering. Even people inside a data team may not be able to acquire important knowledge internally due to the speed and structure of operations.
“There are always people that want numbers and reports in a company, and you’re in such a defensive and reactive state, that sometimes you don’t get the opportunity to take a step back and learn how to do things the right way,” said Carroll.
People who are trying to perform the duties that an analytics engineer would — getting data sets pristine for analysts, running version control and continuous integration on analytics code — generally don’t have many mentors to engage with, unlike, for instance, software engineering teams, which usually have deeper benches and several senior contributors.
“The teams are very small, if they exist at all. … A lot of times analytics engineers are really figuring it out for themselves on the production system,” said Kaminsky. “And there’s really no one in the organization who can answer a question.”
But those who have to look outside their employer to become proficient should also avoid resources that rely on artificially clean data sets. It’s important to encounter data in a more realistic manner than students often see in bootcamps — think messy data, in multiple different tables, in BigQuery; not clean data, in a single table and CSV files.
“You will have to do some amount of transformation, think about the data, how it’s structured and how that maps on to the question you’re trying to answer,” said Kaminsky of the data sets students will encounter at the AEC.
Who Should Become an Analytics Engineer?
Neither Kaminsky nor Carroll came to analytics engineering as software developers, having shifted to the role from data science and data analytics, respectively. That seems to be the path taken by many — which may be for the best.
“There are benefits that come with coming from a non-engineering background,” Kaminsky said. “I think a lot of these people bring really strong business perspectives and a strong ability to work with data.”
In her 2019 post that helped define analytics engineering, Carroll offered a series of questions that illustrated the kind of problems that analytics engineers care about. They included considerations like how to answer more business-intelligence questions with fewer tables, and how to make naming conventions for tables as direct and understandable as possible.
In other words, analytics engineers are people who really care about streamlining data transformation and prepping data sets as efficiently as possible. Software engineering best practices can — and need to — be learned, but an innate curiosity about those sorts of challenges is probably more important.
Of course, people with a natural interest in software engineering often find themselves drifting toward this realm. Mitchell Silverman, an analytics engineer at Spotify, told Dataform that their analytics engineers tend to be people who “used to be analysts or data scientists who for whatever reason ended up getting more into software engineering and doing some backend work, and over time have built up the skill set.”
But even someone who may initially be wary by those more technical aspects can succeed — like “an analyst that knows how to write the SQL to answer a question like ‘What’s my monthly revenue?’ but feels intimidated by the command line or git,” as Carroll wrote in a recent blog post.
At the same time, people looking to transition into analytics engineering should try to work in tandem with their employer to determine if it’s a good fit and to best establish that knowledge in their work. AEC students, for instance, will be encouraged to work with their employers to get the necessary programs installed in their shops, “so they can continuously map back to their own work environment,” said Kaminsky.
What are some signs that your data team could benefit from adding one or more analytics engineers? Consider output and size. If business intelligence reporting or “deep-dive analysis” routinely takes longer than anticipated, that’s an indication, Andres Recalde, director of data and analytics at Banza, said in a talk late last year.
Still, if a data team isn’t expected to grow to a size where meaningful specialization makes sense, that may not be enough to necessitate an analytics engineering team. It might make sense, however, for a mid-sized or enterprise organization that has more than 10 people focused on data, he said.
Role or Skill?
Of course, if an organization is struggling with slow or mish-mashed reporting, it still needs to get its data-quality foundations set, regardless of team size. So whether a company considers analytics engineering a role or simply a competency, it still needs to think hard about how to ensure code quality for all its pipelines.
Analytics engineers tend to manage data dictionaries and help reinforce software engineering standards across the data team. The more someone can focus on that specialization, the stronger the downstream effects become. That might be a point in favor of a dedicated analytics engineering role.
“The salary for analysts and data scientists keeps going up, so if you can hire one person who can make your whole team of five to 10 analysts or data scientists much more effective by bringing a software engineering lens to the tools they’re building that pays for itself, usually,” said Kaminsky.
Still, much more hinges on whether or not the work is done than which hat it falls under.
“I don’t think it matters hugely if it’s a data engineer title or analytics engineer title, or who’s doing the work, but it is important to recognize that someone needs to do it,” said Kaminsky.