The Highs, Lows, and Plateaus of Creating a Machine Learning Model

Respectable score, and I can take a deep breath before continuing to improve.

There are about 40 features that were used to train my model, and I wanted to see if they are all contributing to the outcome.

I ran a feature importances test on my most recent model, then looked at the top 15 features.

Using that knowledge, I continue to upgrade my model.

I choose 8 of what I think are the most important features and run another model.

78%Day 4This is what I like to call iteration day.

I used my XGBClassifier, and tweaked things until there was nothing else to try.

I ran probably 2 dozen models and changed one small item until I ran out of time (and patience.

)Model 1 — 8 features, simple imputer, standard scaler: 78%Model 2 — 8 features, simple imputer, no scaler: 78%Model 3 — 8 features, tweaked parameters of simple imputer, standard scaler: 78%Model 4 — 8 features, simple imputer, robust scaler: 79%Model 5 — 9 features, simple imputer, standard scaler: 79%And on and on and on I went.

I finally breached an accuracy score of 80, but I won’t go into detail how I got there.

If you want to see the work behind my week, you can take a look at the code from my github repository here.

Things I learnedSimple is (mostly) betterDomain knowledge is an important part of understanding relationships in data.

Cleaning your data can make a substantial difference.

Iteration can be a productive method if you have the time.

.. More details

Leave a Reply