More machine learning for people who know nothing about machine learning

The formula for the recall is the number of true positives, divided by the number of true positives plus the number of false negatives.Recall of spam classifierWhat this really means is that, in the table above, we only consider the two values in the left column, and we divide 70 by 75 (the total of the column) — 93%.So, now we know how to calculate these two values, what do they mean intuitively?The precision of the model is a way of describing how many of the emails that the model said were spam, were actually spam. A value of 93% means that for every 100 emails the model says are spam, seven of them will actually not be. It is easy to imagine how this would be annoying. Even though the accuracy seemed high at 90%, and the precision seemed high at 93%, the model turns out not to be that good.To see this, instead of spam emails, consider a model that decides whether a patient has cancer or not. In this case, those seven emails from above are actually seven healthy patients that get told that they have cancer. The cost of that 93% precision here ends up causing people a lot of suffering.The recall of the model is a way of describing how many of the actual spam emails were detected as spam. A value of 93% means that for every 100 spam emails, the model will say that seven of them are not spam. This would lead to a slow build-up of spam emails making their way into your inbox. Also, annoying.Again, in the case of a model that detects cancer in patients, the outcome this time is even worse. Those seven emails are actually seven patients that have cancer but are told that they don’t!In a nutshell, the precision is a good value to use when the cost of false positives is high, i.e. the cost of answering “Yes” when the answer is actually “No”. The recall is a good value to use when the cost of false negatives is high, i.e. the cost of answering “No” when the answer is actually “Yes”.Wrapping upLike the last article, I’ll cover the new things we’ve learnt here to hopefully solidify them in your mind.Quantitative models: These are machine learning models that answer questions with numeric answers, e.g. 150cm, 42 days, 20 cars, etc.Qualitative models: These are machine learning models that answer questions with answers that are not numeric, e.g. “Yes and No”, “Rainy, Snowy, or Sunny”, “Happy, Sad, Excited, or Angry”, etc.Accuracy: The accuracy of a model describes how often the model answers the question correctly.Supervised learning: This is when we train a model using data where we already know what the answer should be so that we can guide the model in its training.Precision: The precision of a model describes how many of the models “Yes” answers were actually correct.Recall: The recall of a model describes how many of the answers that should be “Yes” the model got correct.These last two can take some time to get your head around, so I’d advise working a few examples through until you get it.There is more that I want to cover, but that will have to wait until the next article as I think this one has gone on for long enough!Thanks for reading, and I hope I’ve managed to teach you a thing or two :). More details

Leave a Reply