Why we choose this specific cost function (and not a linear function or any other function for that matter) is again a topic for another post:Stare at the graphs for a while, these are the cost functions when the true value is 1 (left) and when it is 0 (right).Imagine you have a data point the true value of which is 1 and your hypothesis function gives you a value of 0.1 (red dot in the left graph) the cost for such a point is high..Similarly if for a true value of 1 your hypothesis outputs a value of 0.9 the cost goes down to near 0 (black cross in the left graph)..The same mechanism works for true values = 0 as shown in the right graph..The red dot corresponds to a hypothesis value of 0.1 and the black cross for a hypothesis value of 0.9..That’s it, we now have to minimize the cost by adjusting parameters beta and gamma for all our data points..The minimization is done by our old friend Gradient Descent..That is all there is to logistic regression..Let us see another example with logistic regression in action.Admission predictionSuppose you want to predict whether you will be selected in a particular university based on scores you obtained in two different examinations..You have data for various students for previous years.Look how after the first few iterations of the optimizer the linear discriminator looks.After 10 iterations our line is getting there!After a few more iterations..Voila!You can see the learned boundary separates the data in two regions..Although it is not a perfect separation which will result in lesser classification accuracy, i.e..there will be some points with incorrect classification.. More details