# Deep Dive into Support Vector Machine

So, data points in figure 4 can be classified as shown in figure 6.Figure 6Role of Kernel Functions and Problem 2Another problem in SVM is that the process of transforming into higher dimensional space, then determining the optimal hyper plane in the new space and then transforming back to the original space is very complex and costs high overhead..Say, for example, if there are 1000 features in a 100 dimensional space and if the 100 dimensional space is transformed into 1000 dimensional space then each feature vectors will have 1000 components and 1000 * 1000 computations (so many computations because hyper plane is described as W.X + b = 0, where X is feature vector) would be required to determine the optimal hyper plane and again the hyper plane will have to be mapped back to the original space..And this whole process costs high overhead.The solution to above problem is the kernel function..The interesting fact about kernel functions is that kernel functions does the above mapping without actually going to the the higher dimensional space..In other words the kernel functions does the above mapping without actually performing all the above computations in the higher dimensional space.The transformation in visualisation shown in figure 5 is done using a polynomial kernel function ϕ((a, b)) = (a, b, a² + b²) as shown in figure 7.Figure 7: Training example of SVM with kernel given by ϕ((a, b)) = (a, b, a² + b²)..Source: https://en.wikipedia.org/wiki/Support_vector_machinePlease note that kernel functions is only applicable if the problem consists of dot product or inner product..And fortunately, the formulation of SVM depends on dot product (Will be proved in coming sections).Mathematical ModellingLet us crack the mathematics behind Support Vector Machine..Given, training set {(Xᵢ,Yᵢ) where i=1,2,3,…,n}, Xᵢ ∈ ℜᵐ, Yᵢ ∈ {+1,-1}..Here, Xᵢ is the feature vector for the iᵗʰ data point and Yᵢ is the label for the iᵗʰ data point..The label can be either ‘+1’ for positive class or ‘-1’ for negative class..The value ‘1’ is taken for the mathematical convenience.Let, Wᵢ be a vector perpendicular to the decision boundary (the optimal hyper plane) and Xᵢ be an unknown vector..Then the projection of Xᵢ vector on the unit vector of Wᵢ will determine if that unknown point belongs to positive class or negative class as shown in figure 8.Figure 8NOTE: In coming sections Wᵗ or W raised to T means W transpose.. More details