Intro to Statistics — Looking at Data |

’)# Linear Regressiongradient, intercept, r_value, p_value, std_err = stats.

linregress(size,cost)print “Gradient and intercept”, gradient, interceptprint “R-squared”, r_value**2print “p-value”, p_valueprint “Standard Error”, std_errlr_x = np.

linspace(1000, 2500, 100)lr_y = gradient*lr_x + interceptplt.

plot(lr_x, lr_y, ‘blue’)# Build figureplt.

xlim([1000, 2600])plt.

ylim([80000, 200000])plt.

savefig(‘values1b.

png’)Valuing Houses — Plot 1 with linear regression lineThe code also provides the following values from the calculated linear regression:Gradient and intercept 80.

0 -2.

91038304567e-11R-squared 1.

0p-value 1.

5e-40Standard Error 0.

0Line gradient is 80, which makes sense with the data set; a 1300 ft² house implies 104.

000$ price — which is 80 multiplied by 1300 — .

The intercept point is almost zero (negligible when compared with the y-axis scales).

About R-squared, Null Hypothesis, p-values and standard errorOur first example is an excellent opportunity to explore the first Statistical variables.

Linear Regression does not only provide an estimated model of our data, but we can also measure how accurate and reliable our model is.

I would not review in depth these concepts as they are explained much better elsewhere (I will provide links), but at least I mention them stating how they can help you to understand if the model you have used is reliable or not.

—R-squared is the coefficient of determination.

It ranges between 0 and 1 and it represents how well the linear regression model fits the data.

A value close to 1 means that the model fits the data very well.

In this particular scenario is 1 because it actually crosses all data points.

So it is the best we can have.

R-squared is the coefficient of determination.

It is a statistical measure of how close the data are to the fitted regression line.

—p-values cannot be explained without first understanding the Null Hypothesis.

In inferential statistics, the null hypothesis is a general statement or default position that there is no relationship between two measured phenomena, or no association among groups.

So basically, by default, you assume that your dependent variables are totally uncorrelated to your independent variables.

It is like, everybody is not guilty until otherwise is proved.

Wikipedia provides a good and easy to follow explanation on Null Hypothesis.

—p-values help you determine the significance of your results/model.

It basically measures if you are denying/rejecting the Null Hypothesis or not.

Small values (less than 0.

05) indicates strong evidence against the Null Hypothesis.

Larger values (more than 0.

05) indicates that your model/analysis failed to deny/reject the Null Hypothesis, so your model/analysis is not proven dependency between variables.

Leave a Reply Cancel reply

Related