In machine learning and statistics, ensemble methods draw inspiration from the principle of collective intelligence: the idea that multiple brains can collectively work together to achieve better results. Ensemble learning captures this idea by employing multiple models to achieve better performance than a single model can achieve on its own.
If you’re just starting with machine learning or already have some experience and want to dive deeper, this article is here to help. We’ll break down the key concepts of ensemble learning in a clear, approachable way, backed by practical, hands-on examples in Python. By the end, you’ll not only grasp the theory behind these powerful techniques but also know exactly how to implement them in your own projects.
What Is Ensemble Learning?
Ensemble learning is a machine learning technique that employs multiple models to achieve better performance than a single model can achieve on its own. In this way, it can overcome the limitations of various individual models.
Advantages of Ensemble Learning
Rather than relying on a single model to deliver the best predictions, ensemble learning combines the outputs of multiple models — often referred to as “base” or “weak” learners. The goal is to draw on their collective strengths to achieve better predictive performance than any individual model could provide.
Let’s break it down further. When we train a single model, like a decision tree or a support vector machine (SVM), we’re essentially searching a hypothesis space to find the best fit for our data. Finding that “perfect” hypothesis is often challenging due to limitations in the data, algorithm or computational power, however. Ensemble methods tackle this problem by combining multiple hypotheses into one cohesive model that, in theory, should perform better.
Essentially, this approach takes advantage of the diversity of its base models. Each model might excel in different aspects of the problem or make different types of errors. By combining them, ensemble methods reduce these errors and produce a final model that is more accurate.
To make this work, ensemble learning often relies on training diverse models with high bias and high variance. Individually, these weak learners may not perform well, but when aggregated — via averaging, voting or weighted mechanisms — they produce a model with strong predictive accuracy.
In essence, ensemble learning operates on the principle that the collective decision of many “weak” models can outperform a single “strong” model, especially in cases where the problem is complex or the data is noisy.
When Should You Use Ensemble Learning
Ensemble learning is particularly effective in scenarios where we may observe noisy or imbalanced data.
Noisy Data
Noisy data refers to data sets that contain errors, outliers or irrelevant information that can obscure meaningful patterns. Models trained on such data often struggle to generalise well, resulting in high variance , where the model performs well on training data but poorly on unseen data.
Ensemble methods like bagging counteract this by training multiple models on different subsets of data. Each model learns a slightly varied perspective, and their combined predictions smooth out the effect of noise. Random forest, a widely used bagging technique, exemplifies this approach by averaging predictions across many decision trees, making the final model less sensitive to outliers and errors. This aggregation reduces variance, leading to more stable and accurate predictions.
Imbalanced Data Sets
In contrast, imbalanced data sets pose a different challenge where one class significantly outnumbers others. For example, in fraud detection, legitimate transactions vastly outnumber fraudulent ones. A naive model may simply predict the majority class to achieve high overall accuracy, neglecting the minority class entirely. This leads to high bias , where the model fails to learn meaningful distinctions for the minority class.
Boosting techniques like AdaBoost and gradient boosting address this by sequentially training models that focus on misclassified samples. This iterative process prioritizes learning from the minority class, correcting the model’s bias. Meanwhile, bagging methods can also assist by generating balanced subsets of data for training, ensuring that the minority class is adequately represented.
Types of Ensemble Learning
We can approach ensemble learning through a variety of techniques, ranging from methods to more sophisticated algorithms. Below, we’ll explore both simple and advanced methods, demonstrating how to apply them in practice.
Simple ensemble methods are intuitive and easy to implement, making them a great starting point for beginners.
Max Voting
In classification problems, each base model votes on the output, and the class with the majority of votes is selected. Each base model in the ensemble votes on the predicted class, and the class with the majority votes becomes the final prediction. This method works well when all base models have similar performance and contribute equally to the decision-making process.
from collections import Counter
def max_voting(
model1_predictions: list[int],
model2_predictions: list[int],
model3_predictions: list[int],
) -> list[int]:
final_predictions = []
for p1, p2, p3 in zip(model1_predictions, model2_predictions, model3_predictions):
vote_counts = Counter([p1, p2, p3])
final_predictions.append(vote_counts.most_common(1)[0][0])
return final_predictions
# Example
predictions1 = [0, 1, 0]
predictions2 = [1, 1, 0]
predictions3 = [0, 1, 1]
final_predictions = max_voting(predictions1, predictions2, predictions3)
print(final_predictions)
# Output: [0, 1, 0]
Averaging
For regression tasks, the outputs of all models are averaged to produce the final prediction. This smooths out individual errors and produces a more accurate overall prediction. For instance, when predicting house prices, averaging the outputs of three different models can yield a more reliable result than relying on any one of them.
def average_ensemble(
model1_predictions: list[float],
model2_predictions: list[float],
model3_predictions: list[float],
) -> list[float]:
final_predictions = []
for p1, p2, p3 in zip(model1_predictions, model2_predictions, model3_predictions):
average_prediction = round((p1 + p2 + p3) / 3, 2)
final_predictions.append(average_prediction)
return final_predictions
# Example
predictions1 = [0.1, 1.2, 0.3]
predictions2 = [1.1, 1.3, 0.4]
predictions3 = [0.2, 1.1, 1.5]
final_predictions = average_ensemble(predictions1, predictions2, predictions3)
print(final_predictions)
# Output: [0.47, 1.2, 0.73]
Weighted Average
This is an extension of the averaging technique we explored earlier, where each model’s output is weighted according to its performance. Models with higher accuracy or lower error rates receive more weight in the final prediction. This approach is particularly useful when some models in the ensemble consistently outperform others.
def weighted_average_ensemble(
model1_predictions: list[float],
model2_predictions: list[float],
model3_predictions: list[float],
weights: tuple[float, float, float],
) -> list[float]:
final_predictions = []
w1, w2, w3 = weights
for p1, p2, p3 in zip(model1_predictions, model2_predictions, model3_predictions):
weighted_average = round((p1 * w1 + p2 * w2 + p3 * w3) / (w1 + w2 + w3), 2)
final_predictions.append(weighted_average)
return final_predictions
# Example
predictions1 = [0.1, 1.2, 0.3]
predictions2 = [1.1, 1.3, 0.4]
predictions3 = [0.2, 1.1, 1.5]
weights = (0.2, 0.5, 0.3)
final_predictions = weighted_average_ensemble(predictions1, predictions2, predictions3, weights)
print(final_predictions)
# Output: [0.39, 1.22, 0.82]
Advanced Ensemble Learning Techniques
Advanced methods build on the principles of simple techniques, offering more sophistication and flexibility. They often leverage model diversity and iterative improvement to address complex problems.
Bagging
Bagging creates multiple subsets of the training data by sampling with replacement. Each subset trains an independent base model, and their predictions are aggregated, typically through averaging, for regression problems, or voting, for classification tasks.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Bagging with Decision Trees
bagging_clf = BaggingClassifier(
estimator=DecisionTreeClassifier(),
n_estimators=50,
random_state=42
)
bagging_clf.fit(X_train, y_train)
y_pred_bagging = bagging_clf.predict(X_test)
print(f'Accuracy: {accuracy_score(y_test, y_pred_bagging):.2f}')
Boosting
Boosting tackles errors iteratively by training base models sequentially. Each new model focuses on correcting the mistakes of the previous ones. Techniques like AdaBoost assign higher weights to misclassified samples, ensuring they receive more attention in subsequent iterations.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.ensemble import AdaBoostClassifier
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
boosting_clf = AdaBoostClassifier(
n_estimators=50,
random_state=42
)
boosting_clf.fit(X_train, y_train)
y_pred_boosting = boosting_clf.predict(X_test)
print(f'Accuracy: {accuracy_score(y_test, y_pred_boosting):.2f}')
Stacking
Stacking combines predictions from multiple base models by using a meta-model, often a simpler algorithm like linear regression, to learn how to best combine them. Unlike bagging and boosting, stacking allows for heterogeneous base models, such as decision trees, support vector machines and neural networks, all working together.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.ensemble import StackingClassifier
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
stack_clf = StackingClassifier(
estimators=[
('dt', DecisionTreeClassifier()),
('svc', SVC(probability=True))
],
final_estimator=LogisticRegression()
)
stack_clf.fit(X_train, y_train)
stack_preds = stack_clf.predict(X_test)
print(f'Accuracy: {accuracy_score(y_test, stack_preds):.2f}')
Blending
Blending is a variant of stacking that uses a validation set to generate base model predictions, rather than the entire training set. This approach avoids information leakage but typically results in slightly less accurate meta-models.
Get Started With Ensemble Learning
Ensemble learning is a practical and powerful approach to improving model performance by letting multiple models work together. Whether you’re tackling a challenging classification problem or trying to make sense of noisy data, ensembles give you a reliable toolkit to achieve better results without overcomplicating things.
With the techniques covered in this article, you now have a solid foundation to start exploring ensemble learning in your own projects. Experiment with the simple methods first, like averaging, and then dive into more advanced approaches like bagging and boosting. You’ll soon see how combining models can make a real difference in real-world problems!
Frequently Asked Questions
What is the difference between deep learning and ensemble learning?
Deep learning uses neural networks with many layers to learn complex patterns directly from raw data, excelling in tasks like image recognition or natural language processing. It relies heavily on large data sets and computational power to train deep architectures.
Ensemble learning, on the other hand, combines multiple models to improve accuracy and robustness. Instead of creating a single complex model, it aggregates simpler models to reduce errors. It’s often used in tasks with structured data or where combining predictions leads to better results.
In short, deep learning focuses on feature learning, while ensemble learning emphasizes combining models for improved performance.
What are the three types of ensemble learning?
- Bagging: Models are trained independently on different subsets of data, and their predictions are averaged (regression) or voted (classification).
- Boosting: Models are trained sequentially, each focusing on errors made by the previous ones. This reduces bias and improves predictions.
- Stacking: Predictions from multiple models are combined using another model (meta-model) to improve performance. This approach allows for using diverse base models, such as neural networks and decision trees, together.