'Spatial Points Outlier Clustering Method
I would like to implement an unsupervised clustering to detect grids (vertical/horizontal lines) for spatial points.
I have tried DBSCAN and it gives subpar results. It is able to pick out the grids as seen in red below:
However, it is not able to completely pick out all the points that form the vertical/horizontal lines and if i relax the parameters of epsilon, it will incorrectly classify more points as noisy (e.g. the bottom left of the picture).
I was wondering if maybe there is a modification model of DBSCAN that uses ellipse instead of circles? Or any other clustering methods recommended for this that does not need to prespecify the number of clusters?
Or is there a better method to identify these points that make the grid? Any help is appreciated.
Solution 1:[1]
You can use an anisotropical DBSCAN by modifying your data this way : value of anisotropy >1 will find vertical clusters and values <1 will find horizontal clusters.
from sklearn.cluster import DBSCAN
def anisotropical_DBSCAN(X, anisotropy, eps, min_samples):
"""ANIsotropic DBSCAN clustering : some documentation would be nice here :)
returns an array with """
X[:, 1] = X[:, 1]*anisotropy
db = DBSCAN(eps=eps, min_samples=min_samples).fit(X)
return db
Here is a full example with data :
import numpy as np
from sklearn.cluster import DBSCAN
from sklearn.datasets import make_blobs
centers = [[1, 1], [-1, -1], [1, -1]]
X, labels_true = make_blobs(
n_samples=750, centers=centers, cluster_std=0.4, random_state=0
)
print(X.shape)
def anisotropical_DBSCAN(X, anisotropy, eps, min_samples):
"""ANIsotropic DBSCAN clustering : some documentation would be nice here :)
returns an array with """
X[:, 1] = X[:, 1]*anisotropy
db = DBSCAN(eps=eps, min_samples=min_samples).fit(X)
return db
db = anisotropical_DBSCAN(X, anisotropy = 0.1, eps = 0.1, min_samples = 10)
core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
core_samples_mask[db.core_sample_indices_] = True
labels = db.labels_
# Number of clusters in labels, ignoring noise if present.
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
# #############################################################################
# Plot result
import matplotlib.pyplot as plt
# Black removed and is used for noise instead.
unique_labels = set(labels)
colors = [plt.cm.Spectral(each) for each in np.linspace(0, 1, len(unique_labels))]
for k, col in zip(unique_labels, colors):
if k == -1:
# Black used for noise.
col = [0, 0, 0, 1]
class_member_mask = labels == k
xy = X[class_member_mask & core_samples_mask]
plt.plot(
xy[:, 0],
xy[:, 1],
"o",
markerfacecolor=tuple(col),
markeredgecolor="k",
markersize=14,
)
xy = X[class_member_mask & ~core_samples_mask]
plt.plot(
xy[:, 0],
xy[:, 1],
"o",
markerfacecolor=tuple(col),
markeredgecolor="k",
markersize=6,
)
plt.title("Estimated number of clusters: %d" % n_clusters_)
Now change the parameters to db = anisotropical_DBSCAN(X, anisotropy = 10, eps = 1, min_samples = 10)
I had to change eps value because the horizontal scale and vertical scale arent the same, but in your case, you should be able to keep the same (eps, min sample)
for detecting lines
And you get horizontal clusters :
There are also implementations of anisotropical DBSCAN that are probably a lot cleaner https://github.com/gissong/ADCN
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |