'How to use manhattan distance for SpectralCluster in sklearn
I am trying to use manhattan distance for SpectralClustering()
in Sklearn. I am trying to set the affinity
parameter to be manhattan
, but getting the following error.
ValueError: Unknown kernel 'manhattan'
What is the proper kernel name should I use for it? Anyone can help? Basically, I want to use SpectralClustering
to realize kmeans
using manhattan
distance metric.
Here the line of code for setting SpectralClustering()
:
clustering = SpectralClustering(n_clusters=10, affinity='manhattan', assign_labels="kmeans")
clustering.fit(X)
Solution 1:[1]
Manhattan distance is not supported in sklearn.metrics.pairwise_kernels
that is the reason for the ValueError.
Valid values for metric are::
[‘rbf’, ‘sigmoid’, ‘polynomial’, ‘poly’, ‘linear’, ‘cosine’]
linear
and manhattan
distance metric are different, you could understand from the example mentioned here:
>>> import numpy as np
>>> from sklearn.metrics import pairwise_distances
>>> from sklearn.metrics.pairwise import pairwise_kernels
>>> X = np.array([[2, 3], [3, 5], [5, 8]])
>>> Y = np.array([[1, 0], [2, 1]])
>>> pairwise_distances(X, Y, metric='manhattan')
array([[ 4., 2.],
[ 7., 5.],
[12., 10.]])
>>> pairwise_kernels(X, Y, metric='linear')
array([[ 2., 7.],
[ 3., 11.],
[ 5., 18.]])
Manhattan distance function is available under sklearn.metrics.pairwise_distance
Now, the simpler way to use manhattan distance measure with spectral cluster would be,
>>> from sklearn.cluster import SpectralClustering
>>> from sklearn.metrics import pairwise_distances
>>> import numpy as np
>>> X = np.array([[1, 1], [2, 1], [1, 0],
... [4, 7], [3, 5], [3, 6]])
>>> X_precomputed = pairwise_distances(X, metric='manhattan')
>>> clustering = SpectralClustering(n_clusters=2, affinity='precomputed', assign_labels="discretize",random_state=0)
>>> clustering.fit(X_precomputed)
>>> clustering.labels_
>>> clustering
Solution 2:[2]
The official documentation on Spectral Clustering tells you that you can use anything supported by sklearn.metrics.pairwise_kernels
. Unfortunately there is no pairwise kernel for the Manhattan distance yet.
If something alike suffices, you could use the linear
distance like this:
clustering = SpectralClustering(n_clusters=10, affinity='linear', assign_labels="kmeans")
Solution 3:[3]
The element of the precomputed matrix should be similarity rather than distance. You can use Gaussian Kernel to do this transformation
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Venkatachalam |
Solution 2 | |
Solution 3 | neo |