Performance Measurement For ClassificationSrajan GuptaBlockedUnblockFollowFollowingJan 2Hi, Everybody.

So this is my second post on Data Science and Machine Learning.

This post is about Performance Measurement when solving a classification problem using machine learning.

So first, what is a classification problem.

Very simple, it is just classifying the dataset into classes of data.

Which means we have to predict the class a particular set of inputs belongs to.

For example, we are given a set of images which includes cats, dogs and rabbits.

The task is to predict which of these animals an input image belongs to.

Performance Measurement is a very crucial step for any problem because this will tell us, how accurate results will the model give.

To learn about performance measurement let’s take a classification problem of digit recognition.

Before going ahead I would like to share about the iSrajan community.

Other than this blog you can also follow the below platforms -So let’s begin with our code.

Importing the required libraries numpy, pandas, matplotlib.

>>> import numpy as np>>> import pandas as pd>>> import matplotlib.

pyplot as plt>>> from sklearn.

linear_model import SGDClassifier>>> #Cross Validation>>> from sklearn.

base import clone>>> from sklearn.

model_selection import cross_val_score>>> #Confusion Matrix>>> from sklearn.

metrics import confusion_matrix>>> #Precision and Recall>>> from sklearn.

metrics import precision_score, recall_score, f1_score>>> #ROC Curve>>> from sklearn.

metrics import roc_curve, roc_auc_scoreThe dataset which we have includes two csv files, train.

csv and test.

csv.

Let’s load them into the pandas dataframe.

>>> train = pd.

read_csv('train.

csv')>>> test = pd.

read_csv('test.

csv')Let’s check the no.

of rows and columns in our dataset.

>>> train.

shape(42000, 785)The train dataset has 42000 rows and 785 columns.

>>> test.

shape(28000, 784)The test dataset has 28000 rows and 784 columns.

The test dataset one less column than the train dataset because it does not have the label (true values) feature column.

>>> y = train['label']>>> del(train['label'])>>> X = trainNow, y contains the ‘label’ values and X contains rest all of the feature columns.

Let’s take a random image from the dataset X and verify if the image represents the true value of the digit.

>>> some_image = X.

values[36001]>>> some_image_reshape = some_image.

reshape(28,28)>>> plt.

imshow(some_image_reshape)>>> y.

values[36001]5So, yes.

We can clearly see the digit in the image is 5 and the true value is also 5.

Initializing the Stochastic Gradient Classifier.

>>> clf = SGDClassifier(random_state=42)Building the model.

>>> clf.

fit(X,y)SGDClassifier(alpha=0.

0001, average=False, class_weight=None,early_stopping=False, epsilon=0.

1, eta0=0.

0, fit_intercept=True,l1_ratio=0.

15, learning_rate='optimal', loss='hinge', max_iter=None,n_iter=None, n_iter_no_change=5, n_jobs=None, penalty='l2',power_t=0.

5, random_state=42, shuffle=True, tol=None,validation_fraction=0.

1, verbose=0, warm_start=False)Predicting the results.

>>> y_pred = clf.

predict(X)Let us test any random image.

>>> plt.

imshow(X.

values[2000].

reshape(28,28))>>> y.

values[2000]Great.

Our classifier has done the right prediction.

We can clearly see that the no.

in the image is 3 and our model also predicts 3, which is correct.

But checking just one value does not tell us anything about the accuracy of the model.

Let's take a look at the methods of performance measurement of a classification model.

Cross ValidationIn this method, we test the performance of the model by testing the accuracy of the model on subsets of the dataset.

>>> cross_val_score(clf, X, y, cv=3, scoring="accuracy")array([0.

86418166, 0.

86555222, 0.

88841263])The above array gives the accuracy of the model when it is made to run on three subsets of the dataset.

This performance measurement technique does not give much satisfactory results when the dataset is skewed.

If the dataset contains a very highly frequent class then in such cases this performance measurement does not work well.

Let's come to our next performance measurement which is the Confusion Matrix.

Confusion MatrixConfusion Matrix is a much better way to measure the performance of a classifier.

The general idea is to count the number of times the classifier confused the images of class A with class B.

Such as in the above problem, for eg.

it will count how many times the classifier confused the images of 5's with images of 3's.

And to know the count of confusions we will get to 5th row and 3rd column.

The value of the cell gives the count.

>>> confusion_matrix(y, y_pred)array([[4075, 0, 2, 6, 12, 10, 14, 1, 3, 9], [ 0, 4587, 4, 32, 9, 7, 4, 2, 30, 9], [ 103, 70, 2967, 600, 127, 66, 88, 64, 49, 43], [ 28, 19, 28, 3995, 16, 125, 10, 19, 15, 96], [ 17, 17, 1, 3, 3909, 8, 11, 3, 4, 99], [ 86, 23, 8, 197, 77, 3276, 48, 6, 24, 50], [ 42, 16, 6, 8, 57, 131, 3870, 1, 4, 2], [ 24, 17, 26, 33, 131, 15, 2, 3768, 5, 380], [ 59, 139, 15, 406, 258, 966, 17, 15, 1987, 201], [ 38, 12, 2, 64, 345, 42, 0, 44, 4, 3637]])The above codes return a matrix of such counts.

Each row in the confusion matrix represents the actual class and each column predicted class.

Let's come to our next performance measurement technique, which is Precision and Recall.

Precision and RecallLet's understand some basic terminology before studying this.

Let's take an example of a binary classifier which predicts if a digit is 5 or not.

True Positive(TP) – This is the set of inputs which were true (they were 5) and the classifier also did the right prediction(The classifier predicted them to be 5).

True Negative(TN) – This is the set of inputs which were false (they were not 5) but the classifier predicted them to be true(classifier predicted them to be 5).

False Positive(FP) – This is the set of inputs which were true (they were 5) but the classifier predicted them to be false(classifier predicted them to be not 5).

False Negative(FN) – This is the set of inputs which were false (they were not 5) and the classifier also predicted them to be false(classifier predicted them to be not 5).

Precision is defined as: Precision = TP/(TP + FP)Recall is defined as: Recall = TP/(TP + FN)>>> precision_score(y, y_pred, average=None)array([0.

9112254 , 0.

93612245, 0.

96992481, 0.

74756737, 0.

7911354 , 0.

70512269, 0.

95226378, 0.

96048942, 0.

93505882, 0.

80357932])The above function returns precision scores for each class ranging from 0 to 9.

The parameter average with value None has to be specified because the problem here is multi-class and by default, the function only works in case of binary classification.

>>> recall_score(y, y_pred, average=None)array([0.

98620523, 0.

9792912 , 0.

71031841, 0.

91817973, 0.

95997053, 0.

86324111, 0.

93546048, 0.

85616905, 0.

4890475 , 0.

86843362])The above function returns recall scores for each class ranging from 0 to 9.

The precision and recall are combined into a single metric called f1 score.

f1 = 2/(1/precision + 1/recall)>>> f1_score(y, y_pred, average=None)array([0.

94723384, 0.

95722037, 0.

82006633, 0.

82413615, 0.

86741374, 0.

77621135, 0.

94378734, 0.

90533397, 0.

64221073, 0.

83474868])The above function returns f1_scores for each class ranging from 0 to 9.

Let's now take a look at our next Performance Measurement metric which is The ROC Curve.

The ROC Curve (Receiver Operating Characteristic)The ROC curve method only works for a binary classifier and it plots a curve between true positive rate vs false positive rate.

Since it only works for a binary classifier lets take two lists y_5, which will contain 1 if the corresponding digit in y is 5 and 0 if it is not 5.

Similarly, this goes for y_pred_5 which will contain 1 or 0 corresponding to the digit in y_pred if it is 5 or not.

>>> y_5 = []>>> for a in y:>>> if a == 5:>>> y_5.

append(0)>>> else:>>> y_5.

append(1)>>> y_pred_5 = []>>> for a in y_pred:>>> if a == 5:>>> y_pred_5.

append(0)>>> else:>>> y_pred_5.

append(1)>>> fpr, tpr, thresholds = roc_curve(y_5, y_pred_5)fpr: False Positive Ratetpr: True Positive RatePlotting the FPR vs TPR using Matplotlib>>> plt.

plot(fpr, tpr, label=None)>>> plt.

plot([0,1], [0,1], 'k–')>>> plt.

axis([0,1,0,1])>>> plt.

xlabel('False Positive Rate')>>> plt.

ylabel('True Positive Rate')>>> plt.

show()The area under the ROC curve gives the accuracy of a classifier.

A perfect classifier will have the area equal to 1 and a purely random classifier will have area equal to 0.

5 .

>>> roc_auc_score(y_5,y_pred_5)0.

9136909629919309The ROC AUC score for the above classifier is given above.

Hope you liked the post.

Please share your valuable feedback in the comment section below.

.. More details