AutoML: Automated Machine Learning Explained

Automation is a key concept in the ongoing conversation about artificial intelligence. Now, AI is automating itself — in a process known as automated machine learning.

Essentially, automated machine learning (AutoML) works by having algorithms take over the process of building a machine learning model. It handles the more mundane, repetitive tasks of machine learning, with the promise of both speeding up the AI development process as well as making the technology more accessible.

What Is Automated Machine Learning (AutoML)?

Automated machine learning, or AutoML, applies algorithms to handle the more time-consuming, iterative tasks of building a machine learning model. This could include everything from data preparation to training to the selection of models and algorithms — all of which is done in a completely automated way.

What Is AutoML?

AutoML is the process of automating the tasks of developing machine learning models. That includes preprocessing data, engineering features, choosing models and tuning hyperparameters. The idea is to make machine learning development more efficient and accessible to those without ML expertise. However, AI talent shortages present even more opportunities for automated machine learning to make an impact.

“To me, I don’t see another way forward except for these more automated approaches,” Sarah Aerni, a VP of machine learning and engineering at Salesforce, told Built In. “There are too many opportunities for AI and simply not enough people to onboard to the business, onboard to the tech, deploy it into production, monitor it, and continue iterating on it. To me, AutoML is where that enters as a solution to scaling.”

Although the concept of automated machine learning has been around for nearly a decade, it remains a work in progress. If and when AI-made AI does reach its full potential, it could be applied beyond the borders of tech companies, changing the game in spaces like healthcare, finance and education.

“Practically anybody who uses machine learning will also use automated machine learning,” Lars Kotthoff, an assistant professor and researcher at the University of Wyoming’s computer science department, told Built In. “Eventually, this will really be deployed everywhere machine learning and AI is used.”

Take a Deeper Dive Automation Will Have Far-Reaching Effects on the Economy. Here’s What You Can Expect

How Does AutoML Work?

Automated machine learning is “mostly about” supervised machine learning, meaning it gives users information about the outcome that they’re trying to predict by creating a model that identifies patterns in labeled data, explained Kjell Carlsson, head of data science strategy and evangelism at Domino Data Lab.

With supervised learning, tagged input and output data is constantly fed into human-trained systems, offering predictions with increasing accuracy after each new data set is fed into the system.

For example, if a company wants to be able to predict whether or not somebody is going to buy its product, they first have to have a data set of past customers, organized by who bought and didn’t buy. Then it has to be able to use that data set to predict what a whole new set of customers will decide to do. Or, if you want a computer to be able to identify a cat in a video, you have to first train it by showing it other videos with cats so it is able to accurately identify one in a video it hasn’t seen before.

Because AutoML algorithms operate at a level of abstraction above the underlying machine learning models, relying only on the outputs of those models as guides, they can also be applied to pre-trained models to gain fresh insights without having to repeat existing research or waste computation power.

Why Is AutoML Important?

The goal of AutoML is to both speed up the AI development process as well as make the technology more accessible.

Much of the work required to make a machine learning model is rather laborious, and requires data scientists to make a lot of different decisions. They have to decide how many layers to include in neural networks, what weights to give inputs at each node, which algorithms to use and more. It’s a job that requires a lot of specialized skill and intuition to do it properly.

The more complex the model, the more complex the work. And some experts say automating some of that work will be necessary as AI systems become more complex. So, AutoML aims to eliminate the guesswork for humans by taking over the decisions data scientists and researchers currently have to make while designing their machine learning models.

Eventually, the goal is to get to the point where a person can ask a question of their data, apply an AutoML tool to it, and receive the result they are looking for without needing overly technical skills. And while there are a growing number of companies looking to democratize machine learning through AutoML, this technology is largely exclusive to people with AI and data science expertise. It’s a tool, not a specific platform; and it’s a tool with fairly narrow uses, according to Carlsson.

Get an Insider’s Opinion Rage Against the Machine Learning: My War With Recommendation Engines

Advantages of AutoML

AutoML promises a range of benefits and is well-suited to handle problems that require the creation and regular updating of hundreds of thousands of models.

Can Tackle Varied Models and Complex Tasks

More often, these models mentioned are classified as forecasting models. For example, if a healthcare provider wanted to predict demand for different units across their network of hospitals, they would need to not only create different models for each hospital, but also the different units within those hospitals, as well as different time frames (one week out, three months out and so on). In the end, you end up with thousands of models, the creation and re-training of which requires an immense amount of work for a human data scientist.

“AutoML models work really, really well in these kinds of instances,” Carlsson said.

More Accurate Than Humans

AutoML generally isn’t prone to the same kind of forgetfulness or shortsightedness that humans are — especially when faced with big, complex problems.

“Using these automated approaches tends to get better results than humans can achieve, simply because the machine doesn’t make mistakes,” Kotthoff said. “It takes all of this information I gathered in a principled fashion and then makes the decisions based on that, where humans are prone to forget things.”

Cost-Efficient ML Development

The biggest advantage of automated machine learning is that data scientists don’t have to do the hard, monotonous work of building ML models manually anymore.

“It’s really something that, in the end, will enable humans to work better and do more work in a small amount of time because they don’t have to do the tedious parts,” Kotthoff said.

Wider Accessibility for Non-Technical Personnel

Because AutoML can handle different parts of the machine learning development process, data scientists don’t need to have extensive knowledge of ML techniques and models. This makes machine learning technology much more accessible to a broader audience, including professionals who come from fields outside of AI.

Increased Scalability of ML Systems

AutoML is designed to handle demanding tasks, making it ideal for companies looking to upgrade their ML workflows to process larger volumes of data. In addition, AutoML can automate the process of training ML models. This speeds up the training process and makes it more feasible for businesses to train ML models for new challenges.

Challenges of AutoML

Although AutoML offers plenty of upsides, the technology also comes with downsides that need to be taken into account.

Disregard of Business and Human Contexts

Imagine the benefit of a sale at your company is $100, and the cost of pursuing a lead is $1. You might be okay with relying on a machine learning model that gives you 99 wrong predictions for every one person that buys $100 worth of product. But, then let’s say your sales capacity only permits 20 calls. That creates a whole new set of restrictions.

“The problem with traditional AutoML is that it doesn’t start from the business reality,” Arijit Sengupta, founder and CEO of Aible, told Built In. “It just tries various parameters and a bunch of models, and comes back and tells you, ‘Here is the best model.’ And that genuinely is completely useless.”

Lack of ML Model Standards

There’s no set standard for what a “good” AI model looks like. Is it based on just accuracy? Does speed contribute? Or its ability to learn? Either way, Carlsson said those metrics very rarely match up to what the business problem actually is.

“The joke is that all of us can create a model that will predict terrorist activity with 99.99 percent accuracy — we just predict that there’s never any terrorism,” Carlsson said. “Terrorism happens so infrequently that if I just predict that terrorism never happens, I’ve got this super accurate model. But it’s a useless model.”

Black Box Effects That Reduce Transparency

Automated machine learning doesn’t offer the “why” of its decision-making process, which is something most of us crave when it comes to trust.

But Kotthoff said it is “quite challenging” to actually achieve that, especially in the case of AutoML, “because of the complexity of this whole machinery and the many decisions that are being made automatically under the hood.”

Demanding Costs and Data Needs

AutoML can be expensive for teams to set up and maintain over time. In addition, it’s another technology that requires large amounts of high-quality data to function properly. If companies don’t have the data science personnel to monitor these systems or don’t have enough data, it may not be worth pursuing AutoML solutions.

Inability to Understand Ethics

AutoML doesn’t have a built-in conception of fairness. You can impose different constraints in an effort to be fair — like equal rejection rate, equal acceptance rate, equal likelihood of success — and then make sure that the AI serves that definition of fairness, but Sengupta says that falls outside the scope of what AutoML is capable of doing because humans have to set those constraints.

“That’s the danger with AutoML,” Sengupta added. “You end up doing the wrong business things and you do the wrong ethical things because the only thing the AutoML system understands is the data.”

AutoML Examples and Use Cases

AutoML can be used on advanced artificial intelligence applications, or simple problems often found in conventional businesses that simply don’t have the humans to do it all.

AutoML Use Cases

Predicting customer churn
Building customer lifetime value models
Predicting equipment failures
Grouping like products together on an e-commerce site
Predicting the success of an email marketing campaign

AutoML Example: Salesforce

Salesforce has thousands of customers that are looking to predict a variety of things, from customer churn to email marketing click-throughs to equipment failures. And all of this requires lots of rich data that is unique to their specific business, which can be used to build customized machine learning models. Salesforce is focused on making the creation of these models easy and accessible to everyone through automated machine learning.

“In order to leverage that data,” Aerni explained, “[Salesforce is] not able to look at it. So we need to use automated machine learning approaches to train on that customer’s data set, in order to transform that data.” This extends into various stages of the machine learning process, from data preparation to training and selecting models and algorithms that are most appropriate.

AutoML Example: Aible

Sengupta’s company Aible aims to help anyone build an AI model that creates value. Aible does this by offering a suite of software. One tool focuses on augmented data engineering, another is augmented analytics, providing companies with key insights into their data in language they can understand. And a third offering is augmented data science and machine learning, where it handles the predictive model building while also factoring in all the benefits of correct predictions and costs of incorrect predictions.

For example, “What’s the benefit of correctly telling you that somebody will buy? What’s the cost of incorrectly telling you somebody will buy when they wouldn’t? And how much capacity do you have to pursue these prospects?” Sengupta explained. “And then our system automatically generates an AI or a set of AI that would create the most economic value, given your unique business.”

Will AutoML Replace Data Scientists?

Like all aspects of automation, AutoML is not immune to the ongoing speculation of it replacing human employees, particularly those working as data scientists. However, AutoML actually hints at a future where data scientists play an even greater role in organizations looking to invest in AI technologies.

AutoML Contributes to the Importance of Data Scientists

The “democratization of data science” was the buzz-phrase when DataRobot first brought this technology to public attention, and it has been reiterated by everyone from Salesforce to Google. But the idea of a business being able to use this technology with absolutely no assistance from data scientists whatsoever hasn’t quite panned out, according to Carlsson.

“There is this view of, ‘Well, if we have the right tools then everybody will be able to do this and we won’t need data scientists anymore.’ I have really never seen that be true,” he said, adding that, if anything, he’s seen folks move in the opposite direction. Companies are hiring more data scientists. And training more data analysts so they can become data scientists.

AutoML Requires Data Scientists to Function in the First Place

Not only will AutoML not replace data scientists, Carlsson says, but data scientists are really the only people who benefit from this technology at all. And even then it’s only “incrementally beneficial” to them, mainly because they require so much additional guidance.

Data teams might use AutoML a little in the beginning to do some exploratory analysis, but when it comes down to making the “real model,” they’re going to create it from scratch themselves.

“It turns out, you actually need folks who understand the data, know how to look at and analyze the distribution of that data, and know how to analyze the results of that data — the validation of the data — in order for you to create a model that actually makes any sense,” Carlsson said.

AutoML Merely Expands Who Can Access AI and ML

Sengupta says the folks who are worried about AutoML replacing data scientists outright are missing the point altogether. He doesn’t think giving everyone the ability to build an AI model that creates value means we have to get rid of data scientists at all. Instead, he likens what Aible does to what the Netscape browser did for widespread internet adoption in the 1990s — it made this foreign and incredibly complex new world more accessible to everyday people.

“Every technology goes through this phase where, initially, you have these experts and only the experts can do it. But the real potential comes when everyone is empowered to leverage it. That’s what’s going to happen with AI. It has to happen,” Sengupta said. Otherwise, the power disparity between the “AI have and have-nots” will continue to grow.

Popular AutoML Tools

So what AutoML tools are available? These are just a few popular choices being used among business professionals to automate machine learning processes.

Aible

Aible’s suite of AI solutions works to automate data science and data engineering tasks across multiple industries. Its products can detect key data relationships, assess data readiness for model input plus augment data analytics and recommendations. Aible connects directly to the cloud for data security, and can be integrated with other tools like Salesforce and Tableau.

AutoKeras

AutoKeras is an open-source library and AutoML tool based on Keras, a Python machine learning API. The tool can automate classification and regression tasks in deep learning models for images, text and structured data. AutoKeras largely applies neural architecture search to optimize code writing, machine learning algorithm selection and pipeline design.

Auto-PyTorch

Auto-PyTorch, based from the PyTorch machine learning library in Python, allows for fully automated deep learning (AutoDL) tasks. It automates algorithm selection and hyperparameter tuning for deep neural network architectures, and can support tabular and time series datasets. Auto-PyTorch applies Bayesian optimization, meta-learning and ensemble construction for automation.

Auto-Sklearn

Auto-Sklearn is an open-source AutoML tool built on the scikit-learn machine learning library in Python. The tool automates supervised machine learning pipeline creation and can be used as a drop-in replacement for scikit-learn classifiers in Python. Like Auto-PyTorch, Auto-Sklearn utilizes meta-learning, ensemble learning and Bayesian optimization to automatically search for learning algorithms when given a new dataset.

Google Cloud AutoML

Google Cloud AutoML is a suite of AutoML tools developed by Google that can be used to create custom machine learning models. Leading the suite is Vertex AI, a platform where models can be built for objectives like classification, regression, and forecasting in image, video, text and tabular data. Vertex AI offers pre-trained APIs and supports all open-source machine learning frameworks, including PyTorch, TensorFlow and scikit-learn.

Frequently Asked Questions

What is AutoML?

Automated machine learning (AutoML) refers to the process of automating different aspects of machine learning development, including preprocessing data, selecting models and setting hyperparameters. This makes machine learning more accessible to non-technical personnel and enables data scientists to develop high-quality models more efficiently.

How long does AutoML take?

Exactly how long AutoML takes depends entirely on the amount of data being fed into the model, as well as how many different types of models are being applied. For standard, structured data sets, an AutoML model can be run in as little as a few seconds. In larger data sets, where the user wants to try out lots of different model permutations of different algorithms, it could take days or even weeks.

What is the difference between ML and AutoML?

Machine learning (ML) is a field of artificial intelligence that enables systems to learn in a way that’s similar to humans, improving their performance through data and real-world experience. AutoML is the process of automating the development of ML technology, so teams can build models without needing ML expertise.

AutoML: What Is Automated Machine Learning?