'Changing label names of Kmean clusters
I am doing the kmean clustering through sklearn in python. I am wondering how to change the generated label name for kmean clusters. For example:
data Cluster
0.2344 1
1.4537 2
2.4428 2
5.7757 3
And I want to achieve to
data Cluster
0.2344 black
1.4537 red
2.4428 red
5.7757 blue
I am not meaning to directly set1 -> black; 2 -> red
by printing. I am wondering is it possible to set different cluster names in kmean clustering model in default.
Solution 1:[1]
No
There isn't any way to change the default labels.
You have to map them separately using a dictionary.
You can take look at all available methods in the documentation here.
None of the available methods or attributes allows you to change the default labels.
Solution using dictionary:
# Code
a = [0,0,1,1,2,2]
mapping = {0:'black', 1:'red', 2:'blue'}
a = [mapping[i] for i in a]
# Output
['black', 'black', 'red', 'red', 'blue', 'blue']
If you change your data or number of clusters:
First we will see the visualizations:
Code:
Importing and generating random data:
from sklearn.cluster import KMeans
import numpy as np
import matplotlib.pyplot as plt
x = np.random.uniform(100, size =(10,2))
Applying Kmeans algorithm
kmeans = KMeans(n_clusters=3, random_state=0).fit(x)
Getting cluster centers
arr = kmeans.cluster_centers_
Your cluster centroids look like this:
array([[23.81072765, 77.21281171],
[ 8.6140551 , 23.15597377],
[93.37177176, 32.21581703]])
Here, 1st row is the centroid of cluster 0, 2nd row is centroid of cluster 1 and so on.
Visualizing centroids and data:
plt.scatter(x[:,0],x[:,1])
plt.scatter(arr[:,0], arr[:,1])
You get a graph that looks like this: .
As you can see, you have access to centroids as well as training data. If your training data and number of clusters is constant these centroids dont really change.
But if you add more training data or more number of clusters then you will have to create new mapping according to the centroids that are generated.
Solution 2:[2]
check out the top response on this related post
sklearn
doesn't include this functionality but you can map the values to your dataframe in a fairly straightforward manner.
current_labels = [1, 2, 3]
desired_labels = ['black', 'red', 'blue']
# create a dictionary for your corresponding values
map_dict = dict(zip(current_labels, desired_labels))
map_dict
>>> {1: 'black', 2: 'red', 3: 'blue'}
# map the desired values back to the dataframe
# note this will replace the original values
data['Cluster'] = data['Cluster'].map(map_dict)
# alternatively you can map to a new column if you want to preserve the old values
data['NewNames'] = data['Cluster'].map(map_dict)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | gojandrooo |