Let’s Talk About Machine Learning Ensemble Learning In PythonBuild Better Predictive Models By Efficiently Combining Classifiers Into A Meta-ClassifierFarhad MalikBlockedUnblockFollowFollowingJun 7Learning about ensembles is important for anyone who wants to get advanced level understanding of the machine learning concepts.
This article will focus on the techniques and methods for combining a set of classifiers to improve the performance of your machine learning solution.
Ensemble methods improve generalisation of the machine learning solution.
As a consequence, they prevent over-fitting.
Photo by Duy Pham on UnsplashWhat Are Ensemble Methods?The goal is to create a meta-classifier.
This meta-classifier has a better generalisation performance than the individual classifiers.
Think of ensemble meta classifier as a solution where a large number of classifiers are combined to produce more accurate and robust predictions than the predictions by each individual classifier.
Photo by Volkan Olmez on UnsplashMajority Voting PrincipleOne of the most famous ensemble methods is based on the majority voting principle.
Imagine we have a number of judges who are voting the candidates in a dancing competition.
It is likely that the result of the combined judges will give us better candidates than the result by an individual judge.
The key in majority voting principle is to take the predicted value that received the most number of votes.
Photo by Campaign Creators on UnsplashHow Does Ensemble Work?Let’s consider that there are five classification algorithms:Random forestsSupport vector machinesLogistic regression classifierBoosted TreesNearest NeighbourNow what we could do is to build an ensemble that combines all of the five classification algorithms into one.
We can also try passing and fitting different subsets of our training set into an algorithm to get better predictions.
Each of the classification algorithm will produce its predictions.
We can them perform majority votes principle and take the prediction which occurred the most.
When we combine the classifiers then the total error rate decreases if compared to the error rate of each individual classifierFor example, let’s assume we are predicting whether a stock price will increase or decrease.
Three out of five of the algorithms predict that the price will go up.
Therefore, we will take the final prediction that the stock price will increase.
Hence ensemble methods produce better predictions and are robust than individual ensembles.
Majority voting is essentially built on top of the Mode mathematical concept, which takes the most occurring value as the average.
Photo by Element5 Digital on UnsplashLet’s Implement An Ensemble Classifier In PythonNow that we understood how and why ensemble classifiers work, let’s implement the ensemble classifier in Python.
We are going to use the sci-kit learning Python library.
2.
We will be using the cross_val_score function to get the accuracy.
3.
We will use the sci-kit learn following algorithms:K Neighbours ClassifierRandom Forest ClassifierDecision Trees Classifier4.
Finally, we will use an ensemble to compute the accuracy score via the majority votes principle.
We will use the VotingClassifier.
from sklearn.
model_selection import cross_val_scorefrom sklearn.
ensemble import RandomForestClassifierfrom sklearn.
neighbors import KNeighborsClassifierfrom sklearn.
tree import DecisionTreeClassifierfrom sklearn.
ensemble import VotingClassifierx_test, y_train = get_data() # function to get datamodel1 = KNeighborsClassifier(n_neighbours=3, metric='minkowski')model2 = RandomForestClassifier(n_estimators=50, random_state=0)model3 = DecisionTreeClassifier(max_depth=1,criterion='entropy')classifier = VotingClassifier(estimators=[ ('kn', model1), ('rf', model2), ('dt', model3)], voting='hard')for model in ([model1, model2, model3, classifier]): scores = cross_val_score(model, x_test, y_test, cv=3, scoring='accuracy') print("Accuracy: " % scores.
mean())After running the code, I encountered that the mean of the ensemble model (VotingClassifer) was better than the other models.
It implies that the predictive performance of an ensemble classifier is superior than the performance of the individual classifiers.
VotingClassifier is useful when we have a number of equally well performing models as it can balance out their individual weaknesses.
Photo by Arnaud Jaegers on UnsplashSummaryThis article highlighted on one of the most useful methods in Machine Learning known as ensemble methods.
It illustrated how ensemble methods function to combine multiple estimators into one model to improve the generalisation and robustness over a single estimator.
Hope it helps.
.