'SKLearn: Getting distance of each point from decision boundary?

I am using SKLearn to run SVC on my data.

from sklearn import svm

svc = svm.SVC(kernel='linear', C=C).fit(X, y)

I want to know how I can get the distance of each data point in X from the decision boundary?



Solution 1:[1]

For linear kernel, the decision boundary is y = w * x + b, the distance from point x to the decision boundary is y/||w||.

y = svc.decision_function(x)
w_norm = np.linalg.norm(svc.coef_)
dist = y / w_norm

For non-linear kernels, there is no way to get the absolute distance. But you can still use the result of decision_funcion as relative distance.

Solution 2:[2]

It happens to be that I am doing the homework 1 of a course named Machine Learning Techniques. And there happens to be a problem about point's distance to hyperplane even for RBF kernel.

First we know that SVM is to find an "optimal" w for a hyperplane wx + b = 0.

And the fact is that

w = \sum_{i} \alpha_i \phi(x_i)

where those x are so called support vectors and those alpha are coefficient of them. Note that there is a phi() outside the x; it is the transform function that transform x to some high dimension space (for RBF, it is infinite dimension). And we know that

[\phi(x_1)\phi(x_2) = K(x_1, x_2)][2]

so we can compute

enter image description here

enter image description here

then we can get w. So, the distance you want should be

svc.decision_function(x) / w_norm

where w_norm the the norm calculated above.

(StackOverflow doesn't allow me post more than 2 links so render the latex yourself bah.)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 yangjie
Solution 2 shivams