As originally appeared on KD Nuggets March 2020
Let’s take a closer look at the importance of ensemble learning to the overall ML process, including the bias-variance tradeoff and the three main ensemble techniques: bagging, boosting and stacking. These powerful techniques should be a part of any data scientist tool kit, as they are concepts that are encountered everywhere. Plus, understanding their underlying mechanism is at the heart of the field of machine learning.
Ensemble Learning Methods: An Overview
As we briefly mentioned above, ensemble learning is an ML paradigm where numerous base models (which are often referred to as “weak learners”) are combined and trained to solve the same problem. This method is based on the theory that by correctly combining several base models together, these weak learners can be used as building blocks for designing more complex ML models. Together, these ensemble models (aptly called “strong learners”) achieve better, more accurate results.
In other words, a weak learner is merely a base model that, alone, performs rather poorly. In fact, its accuracy level is barely above chance, meaning that it predicts the outcome only slightly better than a random guess would. These weak learners will often be computationally simple as well. Typically, the reason base models don’t perform very well by themselves is because they either have a high bias or too much variance, which makes them weak.
This is where ensemble learning comes in. This method attempts to try to reduce the general error by combining several weak learners together. Think of ensemble models as the data science version of the expression “two heads are better than one.” If one model works well, then a number of models working together can do even better.
A Word about the Bias-Variance Tradeoff
Before we get to the three main ensemble learning techniques, it’s important to introduce the concept of a weak learner and why it earned this name in the first place. As we mentioned above, the reason comes down to either bias or variance. More specifically, the prediction error of an ML model, namely the difference between the trained model and the ground truth, can be broken down into the sum of the following: the bias and the variance. For instance:
- Error due to Bias: This is the difference between our model’s expected prediction and the precise value that we are aiming to predict.
- Error due to Variance: This is the variance of a model prediction for a specified data point.
If our model is too simplistic and doesn’t have many parameters, then it may have high bias and low variance. In contrast, if our model has many parameters, then it may have high variance and low bias. As such, it’s necessary to find the right balance without underfitting or overfitting the data. This tradeoff in complexity is the reason why there exists a tradeoff between variance and bias. Simply put, an algorithm can’t simultaneously be more complex and less complex at the same time.
Ensemble Learning Techniques - Combining Weak Learners
Now that we’ve addressed the issue of bias-variance trade-off, we can continue with the three main ensemble techniques: bagging, boosting and stacking. Let’s take a look at each technique separately:
Bagging attempts to incorporate similar learners on small-sample populations and calculates the average of all the predictions. Generally, bagging allows you to use different learners in different populations. By doing so, this method helps to reduce the variance error.
Boosting is an iterative method that fine-tunes the weight of an observation according to the most recent classification. If an observation was incorrectly classified, this method will increase the weight of that observation in the next round (in which the next model will be trained) and will therefore be less prone to misclassification. Similarly, if an observation was classified correctly, then we’ll reduce its weight for the next classifier. The weight represents how important the correct classification of the specific data point should be. This enables the sequential classifiers to focus on examples previously misclassified. Generally, boosting reduces the bias error and forms strong predictive models. However, at times, they may overfit on the training data.
Stacking is a clever way of combining the information provided by different models. With this method, we use a learner of any sort to combine different learners’ outputs. The result can be a decrease in bias or variance determined by which combined models we use.
In summary, ensemble learning is about combining multiple base models to achieve a more effective and accurate ensemble model that features more powerful properties and, thus, performs better. Case in point: Ensemble learning methods have successfully set record performances on challenging datasets and are constantly a part of the winning submissions of Kaggle competitions.
It’s worth noting that even with the three main ensemble techniques that we’ve described - bagging, boosting and stacking - variants are still possible and can be better designed to more effectively adapt to specific problems, such as classifications, regression, time-series analyses, etc. But first, this requires that we fully understand the problem at hand and be creative in our approach to problem-solving! Using ensemble learning methods is a great, promising way to start approaching any problem.