Support vector machine (SVM) is a linear model for classification and regression problems. A support vector machine algorithm creates a line or a hyperplane that separates data into classes. It can solve linear and non-linear problems and works well for many practical challenges.
Support Vector Machine (SVM) Explained
A support vector machine (SVM) is a linear model that creates a line or hyperplane to separate data into classes. It’s used to solve classification and regression problems for machine learning.
I’ll offer an overview of SVMs. I will talk about the theory behind SVMs, its application for non-linearly separable data sets and how to implement SVMs in Python.
What Is a Support Vector Machine?
SVMs find a separating line (or hyperplane) between data of two classes. It’s an algorithm that takes the data as an input and outputs a line that separates those classes if possible.
Suppose you have a data set, as shown below, and you need to classify the red rectangles from the blue ellipses, let’s say the positives from the negatives. Your task is to find an ideal line that separates this data set in two classes, red and blue.
Not too challenging, right?
But there isn’t a unique line that does the job. In fact, we have infinite lines that can separate these two classes. So, how does SVM find the ideal one?
Let’s examine some probable candidates and figure it out ourselves.
We have two candidates here, the green colored line and the yellow colored line. Which line according to you best separates the data?
If you selected the yellow line then congrats, because that’s the line we are looking for. It’s visually quite intuitive in this case that the yellow line classifies better. But, we need something concrete to fix our line.
The green line in the image above is quite close to the red class. Though it classifies the data set it’s not a generalized line, and in machine learning, our goal is to get a generalized separator.
How Support Vector Machines Find the Optimal Hyperplane
According to the SVM algorithm, we find the points closest to the line from both the classes. These points are called support vectors. Now, we compute the distance between the line and the support vectors. This distance is called the margin. Our goal is to maximize the margin. The hyperplane, for which the margin is maximum, is the optimal hyperplane.
SVM tries to make a decision boundary so that the separation between the two classes, that street, is as wide as possible.
Seems simple, right? Let’s consider a more complex data set that isn’t linearly separable.
How Support Vector Machines Calculate a Hyperplane in Higher Dimensions
This data is clearly not linearly separable. We can’t draw a straight line that can classify this data. But the data can be converted to linearly separable data in a higher dimension. Let’s add one more dimension and call it a z-axis. the coordinates on z-axis will be governed by the constraint: z = x²+y²
.
Basically, the z coordinate is the square of distance of the point from origin. Let’s plot the data on z-axis.
Now, the data is linearly separable. Let the purple line separating the data in higher dimension be z=k, where k is a constant. Since, z=x²+y²
, we get x² + y² = k
, which is an equation of a circle. So, we can project this linear separator from a higher dimension back to its original dimensions using this transformation.
We can classify data by adding an extra dimension to it so that it becomes linearly separable and then projecting the decision boundary back to original dimensions using mathematical transformation. But finding the correct transformation for any given data set isn’t that easy. Thankfully, we can use Kernels in Sklearn’s SVM implementation to do this job.
What Is a Hyperplane in Support Vector Machine?
A hyperplane in an n-dimensional Euclidean space is a flat, n-1 dimensional subset of that space that divides the space into two disconnected parts.
For example, let’s assume a line to be our one dimensional Euclidean space, our data sets lie on a line. Now, pick a point on the line, this point divides the line into two parts. The line has one dimension, while the point has zero dimensions. So, a point is a hyperplane of the line.
For two dimensions, we saw that the separating line was the hyperplane. Similarly, for three dimensions a plane with two dimensions divides the 3D space into two parts and acts as a hyperplane. For a space of n dimensions, we have a hyperplane of n-1 dimensions separating it into two parts.
How to Implement Support Vector Machine in Python
import numpy as np
X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
y = np.array([1, 1, 2, 2])
We have our points in X, and the classes they belong to in Y. Now, we train our SVM model with the above data set. For this example, I have used a linear kernel.
from sklearn.svm import SVC
clf = SVC(kernel='linear')
clf.fit(X, y)
To predict the class of new data set:
prediction = clf.predict([[0,6]])
How to Set Tuning Parameters for the Support Vector Machine
Parameters are arguments that you pass when you create your classifier. The following are the important parameters for SVM:
1. Set the Value for C
It controls the trade off between smooth decision boundary and classifying training points correctly. A large value of c means you will get more training points correctly.
Smooth decision boundary vs classifying all points correctly
Consider the figure above. There are a number of decision boundaries that we can draw for this data set. A straight, green colored, decision boundary is quite simple, but it comes at the cost of a few points being misclassified. These misclassified points are called outliers.
We can also make something that is considerably more wiggly, the sky blue colored decision boundary, that contains all of the training points. Of course, the trade off of having something that is very intricate and complicated like this is that it’s not going to generalize quite as well to our test set. So something that is simple and straighter may actually be the better choice if you look at accuracy. A large value of c means you will get more intricate decision curves trying to fit in all the points. Figuring out how much you want to have a smooth decision boundary versus an accuracy is part of the artistry of machine learning. So, try different values of c for your data set to get the perfectly balanced curve and avoid overfitting.
2. SEt the Value for Gamma
This defines how far the influence of a single training example reaches. If it has a low value, it means that every point has a far reach, while a high value of gamma means that every point has close reach.
If gamma has a very high value, then the decision boundary is just going to be dependent upon the points that are very close to the line, which results in ignoring some of the points that are very far from the decision boundary. This is because the closer points get more weight, and it results in a wiggly curve, as shown in the previous graph. On the other hand, if the gamma value is low, even the far away points get considerable weight and we get a more linear curve.
I hope this blog post helped in understanding SVMs