‘Meta’ machine learning packages in R

I was excited to learn specific packages for GLM, CART, and others.The more I learned about applying the individual models, the more I became curious (and greedy) to learn about additional new models, though they were usually less trivial to understand and implement with my datasets..My motivation to learn and implement more models was partly to achieve higher performance for my models and datasets (in terms of accuracy of the model’s prediction)..However, this competitive race was not the only reason..Guided by the ‘no free lunch’ theorem, by Wolpert and Macready, arguing that there is no single model that will always have the best performance for any type of dataset, I had a legitimate scientific excuse to keep trying as many models as possible.pixabaytoo early excitement ?For the basic models, with abundant documentation and examples, learning how to apply them was almost straightforward..However, there were still many other complex models that were less trivial to apply..Their complexity to me was often due to additional tunable parameters that were either lacking specific clear guidelines on how to use, or perhaps the techniques to do so were beyond my understanding..For example, choosing alpha and beta parameters for a penalized regression, LASSO, via internal cross-validation technique, or other parameter tuning approaches.In parallel to my endless search for new models, another challenge I experienced was to implement methods that deal with the issue of overfitting, such as cross-validation and bootstrapping..Even though there were packages that specialized in such methods, it was not trivial to integrate them, and wrap them around the various models/packages..The same was true for advanced search methods for tuning parameters, and ensemble (specifically stacking).Even though I thought I had a guided map how to navigate through the various models and packages, sooner than later came along some disappointment..It was too hard to integrate all of these methods all together..Not only can each of these models be complex to understand themselves, wrapping multiple models together, nested within each other, and tied into other ‘heavy lifting’ approaches such as resampling, benchmarking, stacking, and others, made it even more complex and overwhelming.Frustrated and exhausted, I was still hoping that there will be some efficient way to integrate it all together, since in the end, all of these methods share the same ‘statistical learning’ workflow: train/fit a model to a training set, predict outcomes with a new dataset (test), and measure some performance metric..Indeed, each of these packages had its own ‘dialect’, reflecting the diverse background of the open source R developer community.. More details

Leave a Reply