Linear Regression — Python Implementation

I will walk through both a simple and multiple linear regression implementation in Python and I will show how to assess the quality of the parameters and the overall model in both situations.You can grab the code and the data here.I strongly recommend that you follow and recreate the steps in your own Jupyter notebook to take full advantage of this tutorial.If you are ready, let’s do this!IntroductionThe data set contains information about money spent on advertisement and their generated sales..Hence, we remove it.data.drop(['Unnamed: 0'], axis=1)Alright, our data is clean and ready for linear regression!Simple Linear RegressionModellingFor simple linear regression, let’s consider only the effect of TV ads on sales..In this case, we haveSimple linear regression equationLet’s visualize how the line fits the data.predictions = reg.predict(X)plt.figure(figsize=(16, 8))plt.scatter( data['TV'], data['sales'], c='black')plt.plot( data['TV'], predictions, c='blue', linewidth=2)plt.xlabel("Money spent on TV ads ($)")plt.ylabel("Sales ($)")plt.show()And now, you see:Linear fitFrom the graph above, it seems that a simple linear regression can explain the general impact of amount spent on TV ads and sales.Assessing the relevancy of the modelNow, if you remember from this post, to see if the model is any good, we need to look at the R² value and the p-value from each coefficient.Here’s how we do it:X = data['TV']y = data['sales']X2 = sm.add_constant(X)est = sm.OLS(y, X2)est2 = est.fit()print(est2.summary())Which gives you this lovely output:R² and p-valueLooking at both coefficients, we have a p-value that is very low (although it is probably not exactly 0)..Surely, spending on newspaper and radio ads must have a certain impact on sales.Let’s see if a multiple linear regression will perform better.Multiple Linear RegressionModellingJust like for simple linear regression, we will define our features and target variable and use scikit-learn library to perform linear regression.Xs = data.drop(['sales', 'Unnamed: 0'], axis=1)y = data['sales'].reshape(-1,1)reg = LinearRegression()reg.fit(Xs, y)print("The linear model is: Y = {:.5} + {:.5}*TV + {:.5}*radio + {:.5}*newspaper".format(reg.intercept_[0], reg.coef_[0][0], reg.coef_[0][1], reg.coef_[0][2]))Nothing more!. More details

Leave a Reply