Respectable score, and I can take a deep breath before continuing to improve.
There are about 40 features that were used to train my model, and I wanted to see if they are all contributing to the outcome.
I ran a feature importances test on my most recent model, then looked at the top 15 features.
Using that knowledge, I continue to upgrade my model.
I choose 8 of what I think are the most important features and run another model.
78%Day 4This is what I like to call iteration day.
I used my XGBClassifier, and tweaked things until there was nothing else to try.
I ran probably 2 dozen models and changed one small item until I ran out of time (and patience.
)Model 1 — 8 features, simple imputer, standard scaler: 78%Model 2 — 8 features, simple imputer, no scaler: 78%Model 3 — 8 features, tweaked parameters of simple imputer, standard scaler: 78%Model 4 — 8 features, simple imputer, robust scaler: 79%Model 5 — 9 features, simple imputer, standard scaler: 79%And on and on and on I went.
I finally breached an accuracy score of 80, but I won’t go into detail how I got there.
If you want to see the work behind my week, you can take a look at the code from my github repository here.
Things I learnedSimple is (mostly) betterDomain knowledge is an important part of understanding relationships in data.
Cleaning your data can make a substantial difference.
Iteration can be a productive method if you have the time.
.. More details