How else could you analyze 36,000 naked mole rat chirps to find out what they’re talking about?
Or translate your cat’s purr or meow to know it’s “just chilling”?
Or auto-generate an image like this just by typing in the words: “giant squid assembling Ikea furniture”?
Thanks to different types of machine learning, that’s all seemingly possible.
4 Types of Machine Learning
- Supervised learning
- Unsupervised learning
- Semi-supervised learning
- Reinforcement learning
What Is Machine Learning?
Machine learning is a branch of artificial intelligence where algorithms identify patterns in data, which are then used to make accurate predictions or complete a given task. The process, which relies on algorithms and statistical models to identify patterns in data, doesn’t require consistent, or explicit, programming. It’s then further optimized through trial and error and feedback, meaning machines learn by experience and increased exposure to data, much the same way humans do.
Today, machine learning is a popular tool used in a range of industries, from detecting fraud in banking and insurance to forecasting trends in healthcare to helping smart devices quickly process human conversations through natural language processing.
4 Types of Machine Learning (With Examples)
Supervised learning involves training a machine and its algorithm using labeled training data, and requires a significant amount of human guidance. It’s one of the most popular forms of machine learning and is able to train models to accomplish tasks in classification, regression or forecasting. Supervised learning is commonly used to create recommender systems, detect inbox spam and predict stock and housing market values.
In order to work, supervised learning requires a significant amount of human intervention because of its use of labeled data sets. Data must be divided into features (the input data) and labels (the output data).
Features describe individual, measurable units of data, such as height, salary, colors or animal breeds.
Labels are used to group data by specific characteristics and are often assigned manually by humans to help explain the context of certain data to the machine. For example, a data label indicates whether or not there’s a dog in a picture, or if the word “hello” is spoken in an audio clip. This teaches a machine what elements it needs to recognize, plus how to identify labeled elements from raw data in the future.
With supervised learning, labeled input and output data is constantly fed and re-fed into human-trained systems, offering real-time guidance for machines. This helps predictions increase in accuracy after each new data set is fed into the system. Humans also provide feedback on the accuracy of the machine learning algorithm during this process, which helps it to learn over time.
Supervised learning, like each of these machine learning types, serves as an umbrella for specific algorithms and statistical methods. Here are a few that fall under supervised learning.
Used to further categorize data, classification algorithms are a great tool to sort, and even hide, that data. (If you use a Gmail or any large email client, you may notice that some emails are automatically redirected to a spam or promotions folder, essentially hiding those emails from view.)
A few popular classification algorithms used to sort data include K-nearest neighbor (KNN), naive Bayes classifier algorithms, support vector machine (SVM) algorithms, decision trees and random forest models.
Regression algorithms are frequently used tools for forecasting trends. These algorithms identify relationships between outcomes and other independent variables to make accurate predictions. Linear regression algorithms are the most widely used, but other commonly used regression algorithms include logistic regressions, ridge regressions and lasso regressions.
In simple linear regression, a feature acts as the x variable, while a label acts as the y variable.
With unsupervised learning, raw data that’s neither labeled nor tagged is processed by the system, meaning less work for humans. Unsupervised learning algorithms discover patterns or anomalies in large, unstructured data sets that may otherwise go undetected by humans. This makes it applicable for accomplishing tasks related to clustering or dimensionality reduction.
Unsupervised learning algorithms work by analyzing available data and grouping information based on similarities and differences, thus creating relationships between data points. Customer and audience segmentation, computer vision and breach detection can all apply unsupervised learning.
These two types of unsupervised learning methods are among the most common.
Clustering algorithms are the most widely used example of unsupervised machine learning. These algorithms focus on similarities within raw data, and then groups that information accordingly. More simply, these algorithms provide structure to raw data. Clustering algorithms are often used with marketing data to garner customer (or potential customer) insights, as well as for fraud detection.
Dimensionality reduction is the process of reducing the amount of features within a data set, all while preserving important properties of the data. This is done to reduce processing time, storage space, complexity and overfitting in a machine learning model.
The two main methods for applying dimensionality reduction include feature selection and feature extraction. Feature selection involves selecting a subset of relevant features from the original feature set to use as input into a model, which helps simplify the model and improve the accuracy of outputs. Feature extraction involves extracting new, significant features from the original raw data for input, which focuses on cutting through redundant data and choosing which features will most improve output.
Popular dimensionality reduction algorithms include principal component analysis (PCA), non-negative matrix factorization (NMF), linear discriminant analysis (LDA) and generalized discriminant analysis (GDA).
Semi-supervised learning offers a balanced mix of both supervised and unsupervised learning. With semi-supervised learning, a hybrid approach is taken as small amounts of labeled data are processed alongside larger chunks of raw data. This strategy essentially gives algorithms a head start when it comes to identifying relevant patterns and making accurate predictions when compared with unsupervised learning algorithms, without the time, effort and cost associated with more labor-intensive supervised learning algorithms.
Semi-supervised learning is typically used in applications ranging from fraud detection to speech recognition as well as text document classification. Because semi-supervised learning uses labeled data and unlabeled data, it often relies on modified unsupervised and unsupervised algorithms trained for both data types.
Here’s a few algorithms that fall under semi-supervised learning.
Self-training algorithms use a pre-existing, supervised classifier model, known as a pseudo-labeler, that’s trained on a small portion of labeled data in a set. The pseudo-labeler is then used to make predictions on the remainder of the dataset, which is unlabeled. Labels produced from this process are called pseudo-labels, and are added back into the labeled dataset. These actions are done repeatedly by the model until all data samples are labeled or there are no more to label, improving its accuracy over time.
Label propagation algorithms assign labels to unlabelled observations by propagating, or allocating, labels through a dataset over time, usually in a graph neural network. These datasets tend to start with a small section already having labels, and assign labels based on direct connections between these data points in the graph. Label propagation can be used to quickly identify communities, detect abnormal behavior or accelerate marketing campaigns. For example, if one customer on a graph likes a certain product, a customer branched directly off of them may also like it.
With reinforcement learning, AI-powered computer software programs are outfitted with sensors, commonly referred to as intelligent agents, that respond to their surrounding environment to make decisions independently that achieve a desired outcome. (Think simulations, computer games and the real world.)
Intelligent agents are self-trained by being rewarded for desired behaviors or punished for undesired behaviors. By perceiving and interacting with their environment, these agents learn through trial and error, ultimately reaching optimal proficiency through positive reinforcement during the learning process. Reinforcement learning is often used in robotics and self-driving cars, helping machines acquire specific skills and behaviors.
These are some of the algorithms that fall under reinforcement learning.
Q-learning is a reinforcement learning algorithm that does not require a model of the intelligent agent’s environment. Q-learning algorithms iteratively calculate the value of actions based on rewards resulting from those actions, which improves outcomes and behaviors over time.
Deep Reinforcement Learning
Used in the development of self-driving cars, video games and robots, deep reinforcement learning combines deep learning — machine learning based on artificial neural networks — with reinforcement learning where actions, or responses to the artificial neural network’s environment, are either rewarded or punished. With deep reinforcement learning, vast amounts of data and increased computing power are required.
Frequently Asked Questions
What is machine learning?
Machine learning is a subfield of artificial intelligence (AI) where systems learn from experiences and optimize processes through exposure to data, all without explicit programming.
What are the four types of machine learning?
- Supervised learning
- Unsupervised learning
- Semi-supervised learning
- Reinforcement learning