'How to find cut-off height in agglomerative clustering with a predefined number of clusters in sklearn?
I'm deploying sklearn's hierarchical clustering algorithm with the following code:
AgglomerativeClustering(compute_distances = True, n_clusters = 15, linkage = 'complete', affinity = 'cosine').fit(X_scaled)
How can I extract the exact height at which the dendrogram has been cut off to create the 15 clusters?
Solution 1:[1]
Try this code with your feature data set X to find heights vs # of clusters:
import numpy as np
from scipy.cluster.hierarchy import linkage, cut_tree
hegits = np.arange(0, 20)
n_clusters = np.zeros(len(hegits))
linked = linkage(X, metric="euclidean", method="average")
for i, d in enumerate(hegits):
t = cut_tree(linked, height=d)
n_clusters[i] = len(np.unique(t))
plt.plot(n_clusters, hegits, '-o')
plt.grid()
plt.xlabel('k')
plt.ylabel('heights')
Or, try this code with your feature data set X to find distance vs # of clusters:
import numpy as np
from sklearn.cluster import AgglomerativeClustering
distance = np.arange(0,20,0.1)
n_clusters = np.zeros(len(distance))
for i, d in enumerate(distance):
cluster = AgglomerativeClustering(distance_threshold=d, n_clusters=None, affinity='euclidean', linkage='ward')
cluster.fit(X)
n_clusters[i] = cluster.n_clusters_
plt.plot(n_clusters, distance, '-o')
plt.grid()
plt.xlabel('k')
plt.ylabel('distance')
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |