How to Calibrate Undersampled Model ScoresImbalanced data problems in binary prediction models and a simple but effective way to take care of them with Python and R.

Emre RencberogluBlockedUnblockFollowFollowingFeb 12Random Undersampling FlowerImbalanced Data, the Root of all EvilThe imbalanced data is simply referred to as a situation that one of the classes forms a high majority and dominates other classes.

For machine learning, a skewed distribution of target values might cause an accuracy bias in algorithms and affect the performance of models negatively.

At this point, the aim of the model and performance metrics are substantial.

Let’s clarify with an example.

If your Target columns have True classes with a %1, a model that always predict False will be %99 of times successful in terms of basic accuracy.

However, this is impractical in most cases because the cost of false positive (or Type I Error) and false negative (or Type II Error) predictions are usually not equal.

There are a couple of solutions to imbalanced data problem but in this article, I will mention the undersampling method and the calibration process which adjust the final scores afterward.

What is Undersampling?Assume that your data has a binary target variable with a highly skewed ratio.

In order to balance the ratio of target and increase the focus of the machine learning algorithm on the minority class, the rows of the majority class are reduced.

This process is called as undersampling and applied in the data preparation phase before the model training.

The data before and after undersamplingAfter the undersampling process, some side effects are seen on the distribution of the model scores.

For instance, if the ratio of the True classes in the training data is %5, we expect that the average of the probability predictions to be %5 as well.

But, in case we manipulate the target class ratio, we also change the distribution of prediction scores.

Random undersampling of non-target classes improves the prior probability of the target class in the train data and it ends up with increased probability predictions.

Is This a Problem?If you aim to select a certain number of instances according to their prediction scores, undersampling is not a problem, by the reason of the fact that it does not change the probability score order of the instances.

For example, if you need to specify a population with the highest propensity score for your marketing campaign, you do not need to worry about the side effects of undersampling.

However, in some cases, a realistic probability prediction matters:Customer lifetime value or similar calculations need calibrated probability predictions.

Suppose that you have product A with a value of 100$ and user B with a propensity score of 0.

1 to Product A.

So, user B has a value of 100$*0.

1=10$ in terms of marketing.

If the cost of false positive or false negative prediction is high, it is required to have realistic probability predictions.

In a situation that you want to specify a criminal, false positives are mostly intolerable.

Or if you try to predict a person whether has cancer or not, the order is usually meaningless, on the other hand, the probability is vital.

“if you give all events that happen a probability of .

6 and all the events that don’t happen a probability of .

4, your discrimination is perfect but your calibration is miserable”.

Daniel KahnemanWhat does Calibration Change?If you use AUC as a model evaluation metric, you cannot see any difference before and after calibration, because AUC cares about distinguishing the classes, not the probability of them.

However, if you use a metric such as Log Loss that works with the likelihood function, it differs.

In the chart below, you can see the probability predictions distributions of undersampled data before and after calibration.

The vertical purple line shows the prior probability of the target class in the original data.

It can be seen on the chart, that the red area of undersampled model predictions becomes highly coherent with the prior probability after the calibration process.

The comparison of the score distribution before and after calibrationLet’s CalibrateTo adjust the probabilities in the model output, we calibrate them.

There are two well-known calibration algorithms: Platt’s Scaling and Isotonic Regression.

Besides these, I want to talk about another uncomplicated calibration formula and its functions in python and R.

Here are the explanations of function parameters:data: Probability predictions array of the model outputtrain_pop: Total row count in the training datasettarget_pop: Total row count of the target class in the training datasetsampled_train_pop: Total row count in the training dataset after undersamplingsampled_target_pop: Total row count of the target class in the training dataset after undersamplingCalibration Functions:Calibration function in RCalibration function in PythonHow to use the function?Let’s say your goal is to generate a model that shows the credit default probabilities and your original training data has 50,000 rows with only 500 of them labeled as target class.

When you sample your non-target instances randomly and reduce the total row count to 10,000, while conserving 500 target rows, our calibration function becomes:calibration(model_results, 50000, 500, 10000, 500)Here model_results is your model probability output array.

After you train your model and put the results in it, your function is ready to use.

Enjoy it!ReferencesHow to Handle Imbalanced Data: An OverviewDealing with Imbalanced Classes in Machine LearningClassifier calibration with Platt’s scaling and isotonic regressionPlatt Scaling Conference PaperWhat is Log Loss?.. More details