Data Science as Baking

Take the time to correctly format and you will find a tremendous tool.Use the best ingredientsIf you have the choice between analyzing fewer high quality data points vs many low quality data points, choose the higher quality..This way you can find better underlying patterns faster..It also makes it easier to follow up after you have cleaned the lower quality data..(More is not always better because of error potential.)Don’t under or over work your dataUnder worked data has gaps, unjustified outliers, and just plain wrong formatting.Example: You are looking at student GPA for a special intervention program..The data has some GPAs reported on a 4 point scale, i.e..3.0 is a B average, and some of the GPAs are reported as a percentage out of 100, i.e..80% is a B average..Clearly, reporting an average GPA in this mixed format will give you a number which is not satisfying on either scale..You must choose a scale and convert all of the data to this format..Also, make sure to account for outliers like A.P..students who may have above a 4.0 scale..These may be correct, but should be examined within the context of the problem.. More details

Leave a Reply