# Regression or Classification? Linear or Logistic?

Linear or Logistic?Understanding the differences & the various models for eachTaylor FogartyBlockedUnblockFollowFollowingJun 11Regression vs ClassificationIn order to decide whether to use a regression or classification model, the first questions you should ask yourself is:Is your target variable a quantity, a probability of a binary category, or a label?If it’s one of the former options, then you should use a regression model.

This means that if you’re trying to predict quantities like height, income, price, or scores, you should be using a model that will output a continuous number.

Or, if the target is the probability of an observation being a binary label (ex.

probability of being good instead of bad), then you should also choose a regression model, but the models you use will be slightly different.

These models are evaluated by the mean squared error (MSE or variation) and root mean squared error (RMSE or standard deviation) to quantify the amount of error within the model.

If it’s the latter option, you want to use a classification model.

This method is useful for predicting a label of an observation (ex.

The tricky part is sometimes realizing whether the target is a label or not.

For example, if the target is an ordinal variable such as a discrete ranking from 1 to 5, then these are labels, but they do still have mathematical meaning.

This means that the mean and variation of the data can still be insightful, but in order to predict, you’ll be better off using classification.

These models are evaluated by the F-score or the accuracy of the model instead of the variation and standard deviation.

This score gives an understanding of how many observations were correctly labeled and can be visualized with a confusion matrix that separates the observations into true positives/negatives and false negatives/positives.

Confusion MatrixIt’s important to understand the characteristics of your target variable before you begin running models and forming predictions.

If you use regression when you should use classification, you’ll have continuous predictions instead of discrete labels, resulting in a low (if not zero) F-score since most (if not all) the predictions will be something other than the 1 or 0 you want to predict.

A way around this is to create a cutoff if you’re using a logistic model that’s giving you probabilities.

For example, maybe you decide that anything over a 0.

9 is a 1 and anything below it is a 0.

By doing this, you can still find an F-score and see the confusion matrix.

However, this extra step can typically be avoided by using an appropriate model.

Once you’ve determined which method to use, the next step is choose the model you’re going to use to generate your predictions.

Regression vs Classification visualRegression ModelsOf the regression models, the most popular two are linear and logistic models.

A basic linear model follows the famous equation y=mx+b , but is typically formatted slightly different to:y=β₀+β₁x₁+…+βᵢxᵢwhere β₀ is the y-intercept, the y-value when all explanatory variables are set to zero.

β₁ to βᵢ are the coefficients for variables x₁ to xᵢ, the amount y increases or decreases with a one unit change in that variable, assuming that all other variables are held constant.

For example, if the equation was y=1+2x₁+3x₂ then y would increase from 1 to 3 if x₁ increased from 0 to 1 and x₂ stayed at 0.

A logistic model follows a slightly altered equation:y= 1 / (1+e^-(β₀+β₁x₁+…+βᵢxᵢ))which constrains it to values between 0 and 1.

For this reason, it’s mostly used for binary target variables where the possible values are zero or one or where the target is the probability of a binary variable.

As mentioned earlier, the equation keeps predictions from being illogical in the sense of having probabilities below 0 or higher than 1.

Linear vs Logistic visualYou can alter both of these standard models in order to better fit your data.

The main way to do this is to include penalties.

For both linear and logistic models, the equation created is going to include every variable you input into it, an easy way to overfit your model.

By overfitting your model, you are decreasing its usefulness for generating predictions outside your training sample.

To avoid this, you can either do a feature selection procedure to pick out the significant features or you can include penalties within your model.

Underfitting and Overfitting visualAdding an L2 penalty will conduct a Ridge regression which will shrink the coefficients of insignificant variables to limit their importance, but will still include all the variables that are input.

This is useful if you want every variable to be included regardless of how important it is, but in most cases, you want the simplest model possible.

Adding an L1 penalty instead will conduct a LASSO regression (Least Absolute Shrinkage and Selection Operator) which will do the same thing as Ridge, but will shrink the coefficients to zero if there are not significant, effectively removing them.

The disadvantage of LASSO is that if you have more variables (k) than observations (n), it will only include up to n variables.

Also, LASSO struggles with correlated variables and will randomly choose one of them to keep.

To overcome these obstacles, you can use Elastic Net regression, which combines the two penalties and better handles high dimensional data and multicollinearity.

This will typically give an equally or more accurate model than LASSO, but this depends of the the L1 ratio chosen as one of the Elastic Net’s hyper-parameters.

Lastly, there is the case where the target variable may not be a strictly linear function of the explanatory variables.

Here, we have two main options: higher order regression or a random forest regressor.

Let’s say while conducting your initial data exploration, you find that when predicting income, age has more of a quadratic relationship with income than a linear relationship.

In this case, you want to include a second-order variable in your once linear equation.

It would then look like y=β₀+β₁x+β₂x² and you would run the model again.

You can still run a Linear Regression on a higher order model.

A common misunderstanding is that only linear functions can be created with linear regression methods.

The “linear” in linear regression refers to the relationship between the coefficients, not the variables themselves, so it is advantageous to include higher orders or interactions in the model if they help explain the relationship better.

However, if you include a higher order variable or interaction, you must keep the lower orders and main effect variables in your final equation, whether they are significant or not.

You cannot have y=β₀+β₂x² or y=β₀+β₁x₁*x₂.

We could also use a random forest regressor which is visualized below, but will explained more later in terms of its alternative, random forest classifier, which is more commonly used.

The regressor is used similarly to a logistic model where the output is a probability of a binary label.

In simplest terms, the random forest regressor creates hundreds of decision trees that all predict an outcome and the final output is either the most common prediction or the average.

Random Forest Classifier for Titanic SurvivalNow you may be thinking, can’t you use any of these models for probability targets?.If the training set’s y-values are 0 to 1, the model will just predict y-values from 0 to 1, right?.Well, yes and no.

The model will most likely always predict values between 0 and 1, but if the model is linear, the values past 0 or 1 are illogical in the case of probabilities.

You can give 110% effort in your model building, but an observation cannot have a 110% likelihood of some category.

Also, a linear model would mean there are equal differences between people that have .

10 and .

15 probabilities of receiving a diagnosis and people that have .

95 and 1.

0 probabilities of receiving a diagnosis.

Obviously, if someone 100% has a disease, there is probably something else going on that is not being explained in the linear model because it was an insignificant feature for probabilities below .

50.

Classification ModelsIf the goal of your analysis to create a model that predicts the label of an observation, you want to use a classification model.

The simplest model is, again, a logistic model.

However, the logistic model can be trained on a nonbinary target variable by creating target dummy variables and running individual logistic models on each.

You can also add the L1 and L2 penalties to this logistic model in order to conduct LASSO and Ridge Logistic models.

More useful however is a random forest classifier which, like the random forest regressor, can include features that may only be significant at a specific point.

To reiterate, this method takes the concept of decision trees and creates a random forest of them, randomly selecting variables to include and then outputs a prediction based on the forest.

Within the code used to run this model, you can specify many hyper-parameters such as the number of trees generated, the minimum number of observations per leaf, the maximum splits, the maximum depth of the tree, etc.

All of these hyper-parameters can help create a more accurate model, but a random forest can still be overfit.

If your trees are too large, they are likely way too specific in order to be applied to a test set.

Random Forest Classifier visualLastly, a neural network can be created in order to predict an observation’s label.

This is the most complex method, but it has certain advantages over the previous methods.

Mainly, it has the opportunity for unsupervised learning.

This means that the algorithm can cluster groups based on similarities it detects without the training data being previously labeled.

While complicated to create, they can be more accurate at predicting labels, important for high-stakes predictions such as disease diagnosis or fraud detection.

Essentially, the algorithm takes in a set of inputs, finds patterns and trends within them, and outputs a prediction (supervised) or clustered groups (unsupervised).

With more iterations and larger training sets, the neural network can become extremely accurate, but be careful of overfitting it to your training set by creating too many layers within the net.

Neural Network visualSummaryThe most important thing to take into consideration when choosing a model to use for prediction is the characteristics of your target variable.

Is it continuous or discrete?.Is it a quantity or a label?.Is it a probability of a category?.Is it linearly related to all the explanatory variables?.Do I want all these variables to be included in its prediction?.The answers to these questions can lead to you choosing the best model for you predictions.