'Spatial Points Outlier Clustering Method

I would like to implement an unsupervised clustering to detect grids (vertical/horizontal lines) for spatial points.

I have tried DBSCAN and it gives subpar results. It is able to pick out the grids as seen in red below: enter image description here

However, it is not able to completely pick out all the points that form the vertical/horizontal lines and if i relax the parameters of epsilon, it will incorrectly classify more points as noisy (e.g. the bottom left of the picture).

I was wondering if maybe there is a modification model of DBSCAN that uses ellipse instead of circles? Or any other clustering methods recommended for this that does not need to prespecify the number of clusters?

Or is there a better method to identify these points that make the grid? Any help is appreciated.



Solution 1:[1]

You can use an anisotropical DBSCAN by modifying your data this way : value of anisotropy >1 will find vertical clusters and values <1 will find horizontal clusters.

from sklearn.cluster import DBSCAN
def anisotropical_DBSCAN(X, anisotropy, eps, min_samples):
    """ANIsotropic DBSCAN clustering : some documentation would be nice here :)
    returns an array with """
    X[:, 1] = X[:, 1]*anisotropy
    db = DBSCAN(eps=eps, min_samples=min_samples).fit(X)
    return db

Here is a full example with data :

import numpy as np

from sklearn.cluster import DBSCAN
from sklearn.datasets import make_blobs
centers = [[1, 1], [-1, -1], [1, -1]]
X, labels_true = make_blobs(
    n_samples=750, centers=centers, cluster_std=0.4, random_state=0
)

print(X.shape)
def anisotropical_DBSCAN(X, anisotropy, eps, min_samples):
    """ANIsotropic DBSCAN clustering : some documentation would be nice here :)
    returns an array with """
    X[:, 1] = X[:, 1]*anisotropy
    db = DBSCAN(eps=eps, min_samples=min_samples).fit(X)
    return db


db = anisotropical_DBSCAN(X, anisotropy = 0.1, eps = 0.1, min_samples = 10)

core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
core_samples_mask[db.core_sample_indices_] = True
labels = db.labels_

# Number of clusters in labels, ignoring noise if present.
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)


# #############################################################################
# Plot result
import matplotlib.pyplot as plt

# Black removed and is used for noise instead.
unique_labels = set(labels)
colors = [plt.cm.Spectral(each) for each in np.linspace(0, 1, len(unique_labels))]
for k, col in zip(unique_labels, colors):
    if k == -1:
        # Black used for noise.
        col = [0, 0, 0, 1]

    class_member_mask = labels == k

    xy = X[class_member_mask & core_samples_mask]
    plt.plot(
        xy[:, 0],
        xy[:, 1],
        "o",
        markerfacecolor=tuple(col),
        markeredgecolor="k",
        markersize=14,
    )

    xy = X[class_member_mask & ~core_samples_mask]
    plt.plot(
        xy[:, 0],
        xy[:, 1],
        "o",
        markerfacecolor=tuple(col),
        markeredgecolor="k",
        markersize=6,
    )

plt.title("Estimated number of clusters: %d" % n_clusters_)

You get vertical clusters : enter image description here

Now change the parameters to db = anisotropical_DBSCAN(X, anisotropy = 10, eps = 1, min_samples = 10) I had to change eps value because the horizontal scale and vertical scale arent the same, but in your case, you should be able to keep the same (eps, min sample) for detecting lines

And you get horizontal clusters : enter image description here

There are also implementations of anisotropical DBSCAN that are probably a lot cleaner https://github.com/gissong/ADCN

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1