Data science and machine learning are closely related fields, so there’s some confusion on what to call specialists in them. As a job seeker, do you look for data scientist roles or machine learning engineer openings? How is a data analyst different? How about a machine learning scientist? Or do machine learning engineer jobs fall under a more general software engineer heading?
The industry is just as confused as you are, and there is little consistency between employers about what they call these roles. But there are some emerging trends we can try to break down.
Data Scientist vs. Machine Learning Scientist
Let’s start by defining what employers mean by a “data scientist.” Searching current job openings at Amazon and Microsoft shows they are looking for the same sort of person.
Data Scientist vs. Machine Learning Scientist
For example, here are the requirements in a job posting for a typical data scientist role at Amazon:
Let’s compare that to a machine learning scientist, posted in the same company:
The requirements in these job postings are similar, but they are also different in some important ways. Both openings are looking for PhD-level education; if you want to call yourself a “scientist,” that’s often expected. Both hiring managers here are looking for people with academic research backgrounds. The ML scientist role is even specifically looking for your publication record. I was surprised to see Amazon looking for academics; before the fields of data science and machine learning took off in 2013, they strongly preferred industry experience over fancy degrees in their candidates (I was a senior manager at Amazon back then). That seems to be changing, at least in some departments.
Both positions involve extracting meaning from raw data, but the difference is in their focus.
Data scientists analyze data using existing tools, databases and scripts. They use SQL, R, SAS, Matlab, open-source distributed systems and programming languages such as Spark, Elasticsearch, Hadoop, Pig and Hive. They deal with large-scale data analysis; knowing about machine learning is just a “nice to have” quality. Data scientists should know how to write scripts in R or Python, but they don’t have to be a software engineer.
In contrast, the machine learning scientist role requires much stronger software engineering skills. They would like “strong software development skills,” and the ability to write code in Java or C++ is a basic qualification. They are looking for people who can develop new tools and systems (in this case, related to natural language processing and speech recognition). They are not just looking for people who can use existing tools to analyze data and extract meaning from it.
Machine Learning Scientist vs. Machine Learning Engineer
Generally, we describe a “machine learning engineer” as a software engineer who specializes in machine learning or artificial intelligence. An ML engineer is a software developer first and a machine learning expert second. To illustrate this, let’s look at the requirements for a machine learning engineer posting from Microsoft:
There’s actually not much at all about machine learning in this listing! It doesn’t even get mentioned until the “preferred qualifications” section.
What you also don’t see is an explicit focus on academic or research backgrounds. This job is for people with a proven track record of building large systems, not people who focus on theory. That’s the main difference between a machine learning “engineer” and a machine learning “scientist” or a data scientist.
What Do Machine Learning Engineers Do?
The largest tech employers have a long history of focusing their hiring on talented software engineers. They believe that a good software engineer is smart enough to learn additional skills independently. They want employees who are fungible and can build whatever the company needs in the long run – even If it doesn’t involve machine learning.
Sometimes, the job title a hiring manager chooses is just a strategic choice on their part. In this case, it seems they really are looking for a more general software engineer – but the hiring manager may just be trying to avoid competing with other hiring managers posting for “software engineer” positions. By calling this role a “machine learning engineer,” they can differentiate their posting and attract people interested in ML.
The same thing might be happening with some “machine learning scientist” roles. The “scientist” title is appealing to some applicants and might make some candidates apply to the “scientist” role instead of the “engineer” role, even though they are similar. Even in a struggling economy, talented software engineers are hard to find, and hiring managers will use any trick they can to attract them.
Data Scientist vs. Data Analyst
Historically, data analysts were people who worked mostly with relational databases and spreadsheets. Their job was to collect data, visualize it and present it to people who use it to make business decisions.
Lately, data analysts have been trying to rebrand themselves with the more lucrative data scientist title. The difference is subtle enough that some smaller employers might hire a data analyst as a data scientist, so sometimes this works.
I think of a data analyst as a data scientist in training. Data analysts primarily work with databases, data warehouses, spreadsheets and high-level tools such as Tableau to analyze data. Data scientists have a stronger statistical background and can use more advanced tools like R or Matlab to script their analysis. It’s a natural progression for a data analyst to learn data science skills, but for now, they are distinct professions.
As an example, let’s look at an Amazon job description for a data analyst role:
It’s worth noting that I could only find data analyst roles in India. Elsewhere, Amazon is looking for data scientists. The main difference between this posting and the ones we’ve looked at for data scientists and machine learning scientists is the level of education required. A bachelor’s degree is the only requirement for an analyst, not a Ph.D. The focus on tools is also different – the data analyst should know how to use Microsoft Office and SQL, but knowing R, Python or Matlab isn’t required.
Data Analyst vs. Data Scientist
Yet, they are looking for people who can become data scientists. They prefer people who have dabbled in R and Python, and a degree in computer science or engineering is just as valuable as a degree in math for this role. The difference between a data analyst and a data scientist is that a data scientist can write code, and they’re looking for people with the potential to make that leap.
Are the Salaries Different?
Given the inconsistency in how people use these titles, it’s hard to read much into average salary data. But Glassdoor offers some data on reported salaries for these different roles.
One thing is clear: A “data analyst” is widely considered a more junior-level position than a data scientist, ML engineer or ML scientist. Machine learning engineers and machine learning scientists reported identical salaries, so Glassdoor seems to consider those titles interchangeable. Usage of those titles varies by company; Amazon doesn’t hire “machine learning engineers,” but they do hire “machine learning scientists” and “software engineers.” Sometimes it just comes down to quirks in how individual companies classify their jobs internally.
Software engineers came in at a lower salary than people specializing in data science or machine learning ($92K vs. $114K on average). This may be skewed, as people calling themselves “machine learning engineers” likely have more experience than an entry-level software engineer would. In reality, many jobs labeled as “software engineer” may involve quite a bit of data science and machine learning if they are at a large company that deals with those fields routinely.
But whether you call yourself a data scientist, a machine learning engineer or a machine learning scientist, your salary will be comparable.
Well, What Jobs Should I Apply For?
You could grossly simplify matters with the following Venn diagram:
If you understand statistics, data analysis and visualization, scripting languages such as Python or R, and can use some advanced tools such as Matlab, but you don’t consider yourself a software engineer, you are looking for a data scientist job.
Suppose you are fundamentally a software engineer with a background in building large, distributed systems, and you’ve managed to learn machine learning and data science along the way. In that case, you are looking for a machine learning scientist or machine learning engineer job.
This diagram does gloss over the differences between data science and machine learning, but data scientists tend to know about machine learning these days, and vice-versa.
To find the best jobs, you shouldn’t restrict your search just to those terms. Many fascinating engineering jobs involving machine learning still fall under the title “software engineer.” If you search only for a “machine learning engineer” title, you’ll miss out on many software engineering positions that really do involve machine learning engineering. And searching for that specific title would almost lock you out of Amazon, where they call them “machine learning scientists” instead. Broaden your search to any job with “machine learning” in the title – you might find some great openings that other candidates have overlooked.
* * *
This article was originally posted on Udemy’s Blog: Machine Learning Engineer vs. Data Scientist: What’s in a Name?