Machine Learning from First Principles

https://bit.

ly/2sLdYKYIt is absolutely vital that the dataset we use meets this criteria because if it doesn’t we will basically have a worthless ML model due to the fact that we cannot get to a high degree of precision without a diverse dataset in the first place.

Training:This part is nearly identical to the previous section where we covered the models, parameters, and learner but I want to show visually what some of the results look like so you can look at machine learning models and impress your friends with analysis of the results!Before I show some examples there is a very important concept to understand and that is variance.

Variance is the measure of how “sensitive” the model is to new data.

Having a bad variance result is really bad for machine learning because it essentially means that we have built a piece of software that only works in a “vaccum” or in other words, fails to translate from theoretical math to the real world due to not taking a plurality of parameters into account.

Underfit:low variance / high biasThis means that our model has good data but the formula we are running it through is too “dumb” and needs to take more things into account.

This is known as bias.

A biased model just ignores things that it should not be ignoring.

This model was too simple and thus didn’t classify properly.

This model was too simple and thus has a few outlier data points.

Overfit:high variance / low biasThis occurs when our data is bad.

Plain and simple.

We don’t have a diverse data set and it is skewed.

However, the model is not too “dumb” and it’s more of an issue of the dataset we chose to train our algorithm.

This is the most common result of supervised machine learning algorithms.

High variance = too large of spread from each point to their average value.

High variance = too large of spread from each point to the average value.

There you have it, that’s supervised machine learning at a super high level.

What is unsupervised learning and how does it work?Much like its name, unsupervised, this type of machine learning involves minimal human involvement.

Sure, the data needs to be cleaned up and presentable but these models are actually useful in telling humans what doesn’t just meet the eye.

One way this is used is in e-commerce.

Companies have enormous amounts of data on customers, both prospective and current.

Every company wants to find new customers to serve their products to.

With massive datasets containing information on what they know about a person, companies can apply unsupervised learning to this dataset to find new ways to sell to these customers along with finding new products to sell to these customers.

Let’s take a look at the most popular algorithm used to do this — k-means clustering:https://bit.

ly/2CJmLSePreviously when we were looking at clustering we saw that example when we had classification and we grouped objects based on whether or not they were the same shape and NOT on them being a valid shape or not a valid shape.

In the application of clustering for unsupervised learning, data is grouped together based on how similar it is and not the category that that data falls into also known as tagging.

For example, in the clustering image I drew out above, you could group according to petal width of different flower independent of the flower type.

With a diverse data set you would have a model that groups objects according to their size and you could infer from the final result some insights.

Insights are what companies that use these models are going after because it is in this end result that they can infer what new products to offer a customer.

So if you had a dataset that had a result that showed a bunch of diverse people (diversity defined as age, race, education, income, geographic area, etc.

) and you knew how much money they were spending per month on products you would group said people according to how much they spent per month regardless of those other factors.

Next, you could sell them more products that other people in their same spending cluster had purchased.

In theory this application would increase sales for the company since you can reasonably assume that some % of these new products are eventually sold to customers.

How does k-means clustering work?Step 1: Separate your data into clusters according to some arbitrary metric (in our case we will be using $ spent per month at some company as our clustering metric).

The number of groups we have is going to be a quantified number represented by the letter “K”.

Hence the name k-means clustering.

Setting of pointStep 2: Choose k-points in the middle of these k-clusters that is at the center of these clusters.

Step 3: Move each of the k-points we originally set to a new point.

This new point is calculated by measuring the distance of each point from the center we originally set.

This is measured using that handy dandy formula that we all learned in algebra called the Euclidian distance or in simpler terms, Pythagorean’s theorem.

dist((x, y), (a, b)) = √(x – a)² + (y – b)²Now, the new point will move to the nearest cluster center.

Note: the new point does not move to the point that is the closest to the origin since that is not a cluster, that is a single data point.

To do this, we aggregate the distance from the original point to each point in our dataset and find the cluster (in our case defined as 3 points) that has the smallest sum (sum defined as each distance added up to a number).

Once we know which of these clusters is the closest cluster we have an idea of which cluster region our new point is going to be in but this does not tell us anything about where exactly we need to move our new point precisely.

Step 4: In order to calculate where we want to move our new point we have to calculate the mean of points that are in our cluster.

The mean, as mentioned earlier, is just the sum of each of those points values divided by the number of points we have.

Movement of point (qualitative estimate)Step 5: This is not really a step per-say but at this step we are going to keep running the previous steps in our algorithm until we are no longer able to move our point position to a cluster that is advantageous to move to.

In other words, we are not going to change the point unless there is a cluster that has a better distance/mean than the previously calculated one.

Let’s take a step back for a second though because I had to explain this process (algorithm) before I made the main point so that it eliminated any confusion.

Remember how I said that we choose k-points in the beginning as the basis of our starting?Well, the entire point of unsupervised learning is to tell the human insights that they may not be aware of.

We cannot do that with great precision without changing our position of the k-points on each iteration.

To keep things brief what happens is these k-points are going to change after each of the above steps are completed and it will keep doing this until we run out of possible places to place these three points or a few other edge cases that I am going to ignore for the sake of brevity.

Sooo why do companies pay engineers to run these algorithms?At the end of our algorithm we will have a final cluster that is the optimal result for the parameters (different attributes to evaluate) we choose to look for.

In that cluster we could have an output like this one for example:David is a 26 year old male who lives in San Francisco and works for Uber.

He spends $100 per month of clothes and usually buys Clarks leather boots.

Melanie is a 19 year old woman who lives in New York City and works for a non-profit.

She spends $150 per month on clothes and usually buys AG jeans.

Marcus is a 40 year old male who lives in Austin, Texas and works for Exxon Mobile.

He spends $50 per month on clothes and usually buys Carhart t-shirts.

Susan is a 25 year old woman who lives in Seattle, Washington and works for Amazon.

She spends $200 a month on clothes and usually buys Frame denim.

Even though these four people have seemingly nothing in common they could actually be in the market for the goods that the other person has, but they don’t even know that they want those goods since there is no ad or person telling them about them.

Well, David and Marcus have similar spending habits as do Melanie and Susan despite working in different cities and being of different ages.

Through analyzing their data a company could serve ads to them based on their “similar shopper” insights they just generated and they would probably get some quantifiable conversion rate on presenting an ad to the customer and the person buying that good.

These mathematical models and engineers that deploy them cost a small amount of money to built/employ and maintain relative to the potential upside.

At scale, you would probably see a lot of sales outweigh the cost of building these models so…This is why it makes sense for companies to use unsupervised learning models like k-means!Where can I go from here?Believe it or not we have only scratched the surface of machine learning.

If you look at that snapshot of AI above you can see that we really only covered a few topics at a super high level.

Some other interesting topics within machine learning that are in that image are things like NLP (natural language processing) which is how Amazon’s Alexa processes what you say, CV (computer vision) which is how airports scan people as they go through security to see if they are on any watch list(s), and many more.

If you really want to dive into any of these subjects you should seriously check out MOOC’s (massive online open course) such as Udacity, Udemy, and Coursera.

If you found this article helpful please give it some claps using the clap button (hold down) and follow me!.Thanks for reading!-Connor.

. More details

Leave a Reply