How To Develop a Machine Learning Model From Scratch

Clustering?What is the expected improvement?What is the current status of the target feature?How is going to be measured the target feature?Not every problem can be solved, until we have a working model we just can make certain hypothesis:Our outputs can be predicted given the inputs.Our available data is sufficient informative to learn the relationship between the inputs and the outputsIs crucial to keep in mind that machine learning can only be used to memorize patterns that are present in the training data, so we can only recognize what we have seen before..When using Machine Learning we are making the assumption that the future will behave like the past, and this isn’t always true.2..Collect DataThis is the first real step towards the real development of a machine learning model, collecting data..This is a critical step that will cascade in how good the model will be, the more and better data that we get, the better our model will perform.There are several techniques to collect the data, like web scraping, but they are out of the scope of this article.Tipically our data will have the following shape:Note: The previous table corresponds to the famous boston housing dataset, a classical dataset frequently used to develop siemple machine learning models..Each row represents a different Boston’s neighborhood and each column indicates some characteristic of the neighborhood (criminality rate, average age… etc)..The last column represents the median house price of the neighborhood and it is the target, the one that will be predicted taking into account the other.3..Choose a Measure of Success:Peter Drucker, Harvard teacher and author of The Effective Executive and Managing Oneself, had a famous saying:“If you can’t measure it you can’t improve it”.If you want to control something it should be obsrvable, and in order to achieve sucess, is essential to define what is considered success: Maybe precision?.accuracy?.Customer-retention rate?This measure should be directly aligned with the higher level goals of the bussines at hand..And it is also directly related with the kind of problem we are facing:Regression problems use certain evaluation metrics such as mean squared error (MSE).Classification problems use evaluation metrics as precision, accuracy and recall.On the next articles we’ll explore in depth these metrics, what are the most adequate to use considering the problem faced, and learn how to set them up.4..Setting an Evaluation ProtocolOnce is clear the goal to achieve, it should be decided how is going to be measured the progress towards achieving the goal..The most common evaluation protocols are:4.1 Maintaining a hold out validation setThis mehod consists on setting apart some portion of the data as the test set.The process would be to train the model with the remaining fraction of the data, tunning its parameters with the validation set and finally evaluating its performance on the test set.The reason to split data in three parts it is to avoid information leaks.. More details

Leave a Reply