How to Develop a Stacking Ensemble for Deep Learning Neural Networks in Python With Keras

In this case, the level 1, or meta-learner, model learns to correct the predictions from the level 0 model.… although it can also be used when one has only a single generalizer, as a technique to improve that single generalizer— Stacked generalization, 1992.It is important that the meta-learner is trained on a separate dataset to the examples used to train the level 0 models to avoid overfitting.A simple way that this can be achieved is by splitting the training dataset into a train and validation set..The level 0 models are then trained on the train set..The level 1 model is then trained using the validation set, where the raw inputs are first fed through the level 0 models to get predictions that are used as inputs to the level 1 model.A limitation of the hold-out validation set approach to training a stacking model is that level 0 and level 1 models are not trained on the full dataset.A more sophisticated approach to training a stacked model involves using k-fold cross-validation to develop the training dataset for the meta-learner model..Each level 0 model is trained using k-fold cross-validation (or even leave-one-out cross-validation for maximum effect); the models are then discarded, but the predictions are retained..This means for each model, there are predictions made by a version of the model that was not trained on those examples, e.g..like having holdout examples, but in this case for the entire training dataset.The predictions are then used as inputs to train the meta-learner..Level 0 models are then trained on the entire training dataset and together with the meta-learner, the stacked model can be used to make predictions on new data.In practice, it is common to use different algorithms to prepare each of the level 0 models, to provide a diverse set of predictions.… stacking is not normally used to combine models of the same type […] it is applied to models built by different learning algorithms.— Practical Machine Learning Tools and Techniques, Second Edition, 2005.It is also common to use a simple linear model to combine the predictions..Because use of a linear model is common, stacking is more recently referred to as “model blending” or simply “blending,” especially in machine learning competitions.… the multi-response least squares linear regression technique should be employed as the high-level generalizer..This technique provides a method of combining level-0 models’ confidence— Issues in Stacked Generalization, 1999.A stacked generalization ensemble can be developed for regression and classification problems..In the case of classification problems, better results have been seen when using the prediction of class probabilities as input to the meta-learner instead of class labels.… class probabilities should be used instead of the single predicted class as input attributes for higher-level learning..The class probabilities serve as the confidence measure for the prediction made.— Issues in Stacked Generalization, 1999.Now that we are familiar with stacked generalization, we can work through a case study of developing a stacked deep learning model.Take my free 7-day email crash course now (with sample code).Click to sign-up and also get a free PDF Ebook version of the course.Download Your FREE Mini-CourseWe will use a small multi-class classification problem as the basis to demonstrate the stacking ensemble.The scikit-learn class provides the make_blobs() function that can be used to create a multi-class classification problem with the prescribed number of samples, input variables, classes, and variance of samples within a class.The problem has two input variables (to represent the x and y coordinates of the points) and a standard deviation of 2.0 for points within each group..We will use the same random state (seed for the pseudorandom number generator) to ensure that we always get the same data points.The results are the input and output elements of a dataset that we can model.In order to get a feeling for the complexity of the problem, we can graph each point on a two-dimensional scatter plot and color each point by class value.The complete example is listed below.Running the example creates a scatter plot of the entire dataset.. More details

Leave a Reply