Logistic Regression as a Nonlinear Classifier

Most certainly not.

As p goes from 0 to 1, log(p/(1-p)) goes from -inf to +inf.

So we need f to be unbounded as well in both directions.

The possibilities for f are endless so long as we make sure that f has a range from -inf to +inf.


1 A linear form for f(x,y; c)Choosing a linear form such asEquation 5.

The default linear form used in general (e.


scikit-learn)will work for sure and that leads to traditional logistic regression as available for use in scikit-learn and the reason logistic regression is known as a linear classifier.


2 A higher-order polynomial for f(x,y; c)An easy extension Equation 5 would be to use a higher degree polynomial.

A 2nd order one would simply be:Equation 6.

A simple extension that lends itself to embed nonlinearities for linear logistic regressionNote that the above formulation is identical to the linear case, if we treat x², xy, and y² as three additional independent features of the problem.

The impetus for doing so would be that we can then directly apply the API from scikit-learn to get an estimate for c.

We see however in the results section that the c obtained this way is not as good as directly solving Equation 4 for c.

It is a bit of a mystery as to why.

But in any case solving Equation 4 is easy enough with modules from scipy.

The derivatives we need for Equation 4 fall out simply as2.

3 Other generic forms for f(x,y; C)A periodic form like f(x,y; c) = sin(c_0x + c_2y) = 0 will not work as its range is limited.

But the following will.

Equation 7.

A non-polynomial form for fWe can again evaluate the derivatives and solve for the roots of Equation 4 but sometimes it is simpler to just directly maximize log(L) in Equation 3, and that is what we will do in the simulations to follow.


SimulationsWe generate equal number of points in the x, y feature space that belong to a class (Z = 0 when f(x,y; c) > small_value) and those that do not belong (Z = 1 when f(x, y; c) < small_value) as we know the explicit functional form of f(x,y; c).

The data is split 80/20 for train vs test.

We use the API from scikit-learn for the linear case that we want to compare with.

We use the scipy.

optimize module when choosing to solve the Equations in 4 for c or for maximizing the likelihood in Equation 3.

LL and dLLbydc in the code snippet below are simply implementing Equations 3 and 4 respectively.

The scipy routines are for minimization so we negate the sign in each case as we want to maximize the likelihood.

Finally we solve for c starting with a small initial guess close to 0.


ResultsWe look at the polynomial and the other generic case separately.


1 f(x,y; c) = c_0 + c_1 x + c_2 y + c_3 x² + c_4 x y + c_5 y²We get an ellipse as the decision boundary for the following c valuesWe generate 1000 data points for each class based on the above function and apply logistic regression to that data to see what it predicts for the decision boundary.

pipenv run python .


py 1000 2.

25 -3.

0 -18.

0 35.

0 -64.

0 50.

0Figure 2 below shows the data distribution and the decision boundaries obtained by the different approaches.

The contour traced by green triangles separates the data perfectly, i.


gets an F1 score of 1.

It is obtained by solving Equation 4 for c.

The red line is obtained by the traditional logistic regression approach — clearly a straight line.

It tries to do the best it can given the nonlinearities and gets an F1 score of well 0.

5… The contour traced by the blue triangles is interesting.

It is obtained by applying traditional logistic regression on the augmented feature space.

That is “x²”, “x*y”, and “y²” are treated as three additional features thereby linearizing f for use with scikit-learn.

It gets an F1 score of 0.

89 which is not bad.

But it is not entirely clear why it should be worse than solving dlog(LL)/dc = 0.

Figure 2.

The elliptical boundary separates the two classes.

Solving dlog(LL)/dc = 0 yields the green contour that best matches the true decision boundary.

The blue contour obtained solving the augmented linear form has over 10% error (why?) while the default application of logistic regression is 50% in error.

The actual values for the coefficients are shown in Table 1 below.

c_0 is scaled to unity so we can compare.

Clearly the coefficients obtained solving dlog(LL)/dc = 0 are closest to the actual values used in generating the data.

Table 1.

Coefficients obtained for the elliptic boundary4.

2 f(x,y; c) = c_0 + c_1 sin(c_2 x²) + c_3 y = 0Figure 3 shows the data distribution and the predicted boundaries upon running the simulation withpipenv run python .


py 1000 1.

0 -1.

0 1.

0 -1.

0The straight line from traditional logistic regression obtains an F1 score of 0.

675 whereas the direct minimization of log(LL) gets a perfect score.

Figure 3.

The green line obtained by minimizing log(LL) perfectly separates the two classes.

Table 2 below compares the values for the coefficients obtained in the simulation.

It is interesting to note that the signs of c_1 and c_2 are opposite between the actual and those predicted by minimization.

That is fine because sin(-k) = -sin(k).

Table 2.

Coefficients obtained for the generic case5.

ConclusionsLogistic regression has traditionally been used to come up with a hyperplane that separates the feature space into classes.

But if we suspect that the decision boundary is nonlinear we may get better results by attempting some nonlinear functional forms for the logit function.

Solving for the model parameters can be more challenging but the optimization modules in scipy can help.

Originally published at xplordat.

com on March 13, 2019.


. More details

Leave a Reply