Receiver Operating Characteristic Curves Demystified (in Python)

The model performance is determined by looking at the area under the ROC curve (or AUC)..To create this, probability distribution, we plot a Gaussian distribution with different mean values for each class..For more information on Gaussian distribution, read this blog.import numpy as npimport matplotlib.pyplot as pltdef pdf(x, std, mean): cons = 1.0 / np.sqrt(2*np.pi*(std**2)) pdf_normal_dist = const*np.exp(-((x-mean)**2)/(2.0*(std**2))) return pdf_normal_distx = np.linspace(0, 1, num=100)good_pdf = pdf(x,0.1,0.4)bad_pdf = pdf(x,0.1,0.6)Now that we have the distribution, let’s create a function to plot the distributions.def plot_pdf(good_pdf, bad_pdf, ax): ax.fill(x, good_pdf, "g", alpha=0.5) ax.fill(x, bad_pdf,"r", alpha=0.5) ax.set_xlim([0,1]) ax.set_ylim([0,5]) ax.set_title("Probability Distribution", fontsize=14) ax.set_ylabel('Counts', fontsize=12) ax.set_xlabel('P(X="bad")', fontsize=12) ax.legend(["good","bad"])Now let’s use this plot_pdf function to generate the plot:fig, ax = plt.subplots(1,1, figsize=(10,5))plot_pdf(good_pdf, bad_pdf, ax)Now that we have the probability distribution of the binary classes, we can use this distribution to derive the ROC curve.Deriving ROC CurveTo derive the ROC curve from the probability distribution, we need to calculate the True Positive Rate (TPR) and False Positive Rate (FPR)..Using this knowledge, we create the ROC plot function:def plot_roc(good_pdf, bad_pdf, ax): #Total total_bad = np.sum(bad_pdf) total_good = np.sum(good_pdf) #Cumulative sum cum_TP = 0 cum_FP = 0 #TPR and FPR list initialization TPR_list=[] FPR_list=[] #Iteratre through all values of x for i in range(len(x)): #We are only interested in non-zero values of bad if bad_pdf[i]>0: cum_TP+=bad_pdf[len(x)-1-i] cum_FP+=good_pdf[len(x)-1-i] FPR=cum_FP/total_good TPR=cum_TP/total_bad TPR_list.append(TPR) FPR_list.append(FPR) #Calculating AUC, taking the 100 timesteps into account auc=np.sum(TPR_list)/100 #Plotting final ROC curve ax.plot(FPR_list, TPR_list) ax.plot(x,x, "–") ax.set_xlim([0,1]) ax.set_ylim([0,1]) ax.set_title("ROC Curve", fontsize=14) ax.set_ylabel('TPR', fontsize=12) ax.set_xlabel('FPR', fontsize=12) ax.grid() ax.legend(["AUC=%.3f"%auc])Now let’s use this plot_roc function to generate the plot:fig, ax = plt.subplots(1,1, figsize=(10,5))plot_roc(good_pdf, bad_pdf, ax)Now plotting the probability distribution and the ROC next to eachother for visual comparison:fig, ax = plt.subplots(1,2, figsize=(10,5))plot_pdf(good_pdf, bad_pdf, ax[0])plot_roc(good_pdf, bad_pdf, ax[1])plt.tight_layout()Effect of Class SeparationNow that we can derive both plots, let’s see how the ROC curve changes as the class separation (i.e. the model performance) improves..We do this by altering the mean value of the Gaussian in the probability distributions.x = np.linspace(0, 1, num=100)fig, ax = plt.subplots(3,2, figsize=(10,12))means_tuples = [(0.5,0.5),(0.4,0.6),(0.3,0.7)]i=0for good_mean, bad_mean in means_tuples: good_pdf = pdf(x, 0.1, good_mean) bad_pdf = pdf(x, 0.1, bad_mean) plot_pdf(good_pdf, bad_pdf, ax[i,0]) plot_roc(good_pdf, bad_pdf, ax[i,1]) i+=1plt.tight_layout()As you can see, the AUC increases as we increase the separation between the classes.Looking Beyond The AUCBeyond AUC, the ROC curve can also help debug a model.. More details

Leave a Reply