Gradient Descent for Machine Learning

We can use the same equation in order to represent the regression line in computer.

If you can’t recall it, let me quickly remind you, it’s y = M * x + BHere, M is the slope of the line and B is the Y intercept.

Let’s quickly recall about slope and Y intercept.

 Slope is the amount by which the line is rising on the Y axis for every block that you go towards right in the X axis.

This tells us the direction of the line and the rate by which our line is increasing.

Mathematically speaking, this meansfor a specified amount of distance on the line.

From the dotted lines in the above picture, we can see that for every 2 blocks in the X axis, the line rises by 1 block in the Y axis.

Hence, slope, M = ½ = 0.

5And it’s a positive value, which indicates that the line is increasing in the upward direction.

Now, let’s come to Y intercept.

It is the distance which tells us exactly where the line cuts the Y axis.

From the above picture, we can see that the line is cutting Y axis on point (0,1).

So, the Y intercept(B) in this case is the distance between (0,0) and (0,1) = 1.

Hence, the straight line on the above picture can be represented through the following equation:y = 0.

5 * x + 1Now we know how to represent the regression line in a computer.

Everything seems good so far.

But, the biggest question still remains unanswered- “How would the computer know the right value of M and B for drawing the regression line with the minimum error?”Exactly that’s why we need the gradient descent algorithm.

Gradient descent is a trial and error method, which will iteratively give us different values of M and B to try.

In each iteration, we will draw a regression line using these values of M and B and will calculate the error for this model.

We will continue until we get the values of M and B such that the error is minimum.

Let’s have a more elaborative view of gradient descent algorithm:Step 1: Start with random values of M and BStep 2: Adjust M and B such that error reducesStep 3: Repeat until we get the best values of M and B (until convergence)By the way, the application of gradient descent is not limited to regression problems only.

It is an optimization algorithm which can be applied to any problem in general.

The math behind gradient descentTill now we have understood that we will use gradient descent to minimize the error for our model.

But, now let us see exactly how gradient descent finds the best values of M and B for us.

Gradient descent tries to minimize the error.

Right?So, we can say that it tries to minimize the following function(cost function):At first we will take random values of M and B.

So, we will get a random error corresponding to these values.

Thus, a random point will be plotted on the above graph.

At this point, there will be some error.

So, our objective will be to reduce this error.

In general, how would you approach towards the minimum value of a function?.By finding its derivative.

Right?.The same thing applies here.

We will obtain the partial derivative of J with respect to M and B.

This will give us the direction of the slope of tangent at the given point.

We would like to move in the opposite direction of the slope in order to approach towards the minimum value.

So far, we have only got the direction of the slope and we know we need to move in its opposite direction.

But, in each iteration, by how much amount we should move in the opposite direction?.This amount is called the learning rate(alpha).

Learning rate determines the step size of our movement towards the minimal point.

So, choosing the right learning rate is very important.

If the learning rate is too small, it will take more time to converge.

On the other hand, if the learning rate is very high, it may overshoot the minimum point and diverge.

To sum up, what we have till now is-A random point is chosen initially by choosing random values of M and B.

Direction of the slope of that point is found by finding delta_m and delta_bSince we want to move in the opposite direction of the slope, we will multiply -1 with both delta_m and delta_b.

Since delta_m and delta_b gives us only the direction, we need to multiply both of them with the learning rate(alpha) to specify the step size of each iteration.

Next, we need to modify the current values of M and B such that the error is reduced.

6.

We need to repeat steps 2 to 5 until we converge at the minimum point.

Implementation of gradient descent using PythonThis was everything about gradient descent algorithm.

Now we will implement this algorithm using Python.

Let us first import all required libraries and read the dataset using Pandas library(the csv file can be downloaded from this link):import pandas as pdimport matplotlib.

pyplot as plt df = pd.

read_csv("student_scores.

csv") #Read csv file using Pandas libraryNext, we need to read the values of X and Y from the dataframe and create a scatter plot of that data.

X = df["Hours"] #Read values of X from dataframeY = df["Scores"] #Read values of Y from dataframeplt.

plot(X,Y,'o') # 'o' for creating scatter plotplt.

title("Implementing Gradient Descent")plt.

xlabel("Hours Studied")plt.

ylabel("Student Score")After that, we will initially choose m = 0 and b = 0m = 0b = 0Now, we need to create a function(gradient descent function) which will take the current value of m and b and then give us better values of m and b.

def grad_desc(X,Y,m,b): for point in zip(X,Y): x = point[0] #value of x of a point y_actual = point[1] #Actual value of y for that point y_prediction = m*x + b #Predicted value of y for given x error = y_prediction – y_actual #Error in the estimation #Using alpha = 0.

0005 delta_m = -1 * (error*x) * 0.

0005 #Calculating delta m delta_b = -1 * (error) * 0.

0005 #Calculating delta b m = m + delta_m #Modifying value of m for reducing error b = b + delta_b #Modifying value of b for reducing error return m,b #Returning better values of m and bNotice, in the above code, we are using learning rate(alpha) = 0.

0005 .

You can try to modify this value and try this example with different learning rates.

Now we will make a function which will help us to plot the regression line on the graph.

def plot_regression_line(X,m,b): regression_x = X.

values #list of values of x regression_y = [] #list of values of y for x in regression_x: y = m*x + b #calculating the y_prediction regression_y.

append(y) #adding the predicted value in list of y plt.

plot(regression_x,regression_y) #plot the regression line plt.

pause(1) #pause for 1 second before plotting next lineNow, when we will run the grad_desc() function, each time we will get a better result for regression line.

Let us create a loop and run the grad_desc() function for 10 times and visualize the results.

for i in range(0,10): m,b = grad_desc(X,Y,m,b) #call grad_desc() to get better m & b plot_regression_line(X,m,b) #plot regression line with m & bFInally, we need to show the plot by adding the following statement:plt.

show()So, the full code for our program is:import pandas as pdimport matplotlib.

pyplot as plt # function for plotting regression linedef plot_regression_line(X,m,b): regression_x = X.

values regression_y = [] for x in regression_x: y = m*x + b regression_y.

append(y) plt.

plot(regression_x,regression_y) plt.

pause(1)df = pd.

read_csv("student_scores.

csv")X = df["Hours"]Y = df["Scores"]plt.

plot(X,Y,'o')plt.

title("Implementing Gradient Descent")plt.

xlabel("Hours Studied")plt.

ylabel("Student Score")m = 0b = 0# gradient descent functiondef grad_desc(X,Y,m,b): for point in zip(X,Y): x = point[0] y_actual = point[1] y_prediction = m*x + b error = y_prediction – y_actual delta_m = -1 * (error*x) * 0.

0005 delta_b = -1 * (error) * 0.

0005 m = m + delta_m b = b + delta_b return m,b for i in range(0,10): m,b = grad_desc(X,Y,m,b) plot_regression_line(X,m,b)plt.

show()Now let’s run the above program for different values of learning rate(alpha).

For alpha = 0.

0005 , the output will look like this:For alpha = 0.

05 , it will look like this:For alpha = 1, it will overshoot the minimum point and diverge like this:The gradient descent algorithm about which we discussed in this article is called stochastic gradient descent.

There are also other types of gradient descent algorithms like- batch gradient descent, mini batch gradient descent, about which we will discuss in future articles.

Till then, stay tuned for the next article :)We also have a video over this topic which is a part of Introduction to Machine Learning course at Code Heroku.

Please let us know if you enjoyed reading this article in the comments section.

.

. More details

Leave a Reply