Intro to Statistics — Looking at Data

’)# Linear Regressiongradient, intercept, r_value, p_value, std_err = stats.

linregress(size,cost)print “Gradient and intercept”, gradient, interceptprint “R-squared”, r_value**2print “p-value”, p_valueprint “Standard Error”, std_errlr_x = np.

linspace(1000, 2500, 100)lr_y = gradient*lr_x + interceptplt.

plot(lr_x, lr_y, ‘blue’)# Build figureplt.

xlim([1000, 2600])plt.

ylim([80000, 200000])plt.

savefig(‘values1b.

png’)Valuing Houses — Plot 1 with linear regression lineThe code also provides the following values from the calculated linear regression:Gradient and intercept 80.

0 -2.

91038304567e-11R-squared 1.

0p-value 1.

5e-40Standard Error 0.

0Line gradient is 80, which makes sense with the data set; a 1300 ft² house implies 104.

000$ price — which is 80 multiplied by 1300 — .

The intercept point is almost zero (negligible when compared with the y-axis scales).

About R-squared, Null Hypothesis, p-values and standard errorOur first example is an excellent opportunity to explore the first Statistical variables.

Linear Regression does not only provide an estimated model of our data, but we can also measure how accurate and reliable our model is.

I would not review in depth these concepts as they are explained much better elsewhere (I will provide links), but at least I mention them stating how they can help you to understand if the model you have used is reliable or not.

—R-squared is the coefficient of determination.

It ranges between 0 and 1 and it represents how well the linear regression model fits the data.

A value close to 1 means that the model fits the data very well.

In this particular scenario is 1 because it actually crosses all data points.

So it is the best we can have.

R-squared is the coefficient of determination.

It is a statistical measure of how close the data are to the fitted regression line.

—p-values cannot be explained without first understanding the Null Hypothesis.

In inferential statistics, the null hypothesis is a general statement or default position that there is no relationship between two measured phenomena, or no association among groups.

So basically, by default, you assume that your dependent variables are totally uncorrelated to your independent variables.

It is like, everybody is not guilty until otherwise is proved.

Wikipedia provides a good and easy to follow explanation on Null Hypothesis.

—p-values help you determine the significance of your results/model.

It basically measures if you are denying/rejecting the Null Hypothesis or not.

Small values (less than 0.

05) indicates strong evidence against the Null Hypothesis.

Larger values (more than 0.

05) indicates that your model/analysis failed to deny/reject the Null Hypothesis, so your model/analysis is not proven dependency between variables.

More on p-Values can be found here.

p-value helps you determine the significance of your results/model.

It basically measures if you are denying/rejecting the Null Hypothesis or not.

In this example, p-values can be considered almost zero, so we can be quite sure that there is a dependency between independent and dependent variables.

—The standard error of a linear regression represents the average distance that the observed values fall from the regression line.

A proper definition can be found here.

The standard error of a linear regression represents the average distance that the observed values fall from the regression line.

In our case is zero because all data fits perfectly in our model.

Your model is readyQuiz: Valuing Houses 3Now you have come up with a model, you can accurately estimate other price values, such as a 2100 ft².

As gradient is 80 and intercept is zero, getting the value is just multiplying gradient by square feet.

p = size*gradient + intercept = 2100*80 = 168000This completes Lesson 2 review.

In the course all example prices are calculated with the assumption that the data is linear.

This post provides a more accurate explanation on how linearity is actually calculated and how linear regression behaves in a perfectly linear data set (which is something that will almost never happen in real-world).

.

. More details

Leave a Reply