'How to plot sequential data, changing the color according to cluster
I have a dataframe with information concerning the date and the cluster that it belongs (it was done before based on collected temperatures for each day). I want to plot this data in sequence, like a stacked bar chart, changing the color of each element according to the assigned cluster. Here it is my table (the info goes up to 100 days):
Date | order | ClusterNo2 | constant |
---|---|---|---|
2020-08-07 | 1 | 3.0 | 1 |
2020-08-08 | 2 | 0.0 | 1 |
2020-08-09 | 3 | 1.0 | 1 |
2020-08-10 | 4 | 3.0 | 1 |
2020-08-11 | 5 | 1.0 | 1 |
2020-08-12 | 6 | 1.0 | 1 |
2020-08-13 | 7 | 3.0 | 1 |
2020-08-14 | 8 | 2.0 | 1 |
2020-08-15 | 9 | 2.0 | 1 |
2020-08-16 | 10 | 2.0 | 1 |
2020-08-17 | 11 | 2.0 | 1 |
2020-08-18 | 12 | 1.0 | 1 |
2020-08-19 | 13 | 1.0 | 1 |
2020-08-20 | 14 | 0.0 | 1 |
2020-08-21 | 15 | 0.0 | 1 |
2020-08-22 | 16 | 1.0 | 1 |
Obs: I can't simply group the data by cluster because the plot should be sequential. I thought writing a code to identify the number of elements of each cluster sequentially, but then I will face the same problem for plotting. Someone know how to solve this?
The expected result should be something like this (the numbers inside the bar representing the cluster, the x-axis the time in days and the bar width the number of observed days with the same cluster in order :
Solution 1:[1]
You could just plot a normal bar graph, with 1 bar corresponding to 1 day. If you make the width also 1, it will look as if the patches are contiguous.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import BoundaryNorm
# simulate data
total_datapoints = 16
total_clusters = 4
order = np.arange(total_datapoints)
clusters = np.random.randint(0, total_clusters, size=total_datapoints)
# map clusters to colors
cmap = plt.cm.tab10
bounds = np.arange(total_clusters + 1)
norm = BoundaryNorm(bounds, cmap.N)
colors = [cmap(norm(cluster)) for cluster in clusters]
# plot
fig, ax = plt.subplots()
ax.bar(order, np.ones_like(order), width=1, color=colors, align='edge')
# xticks
change_points = np.where(np.diff(clusters) != 0)[0] + 1
change_points = np.unique([0] + change_points.tolist() + [total_datapoints])
ax.set_xticks(change_points)
# annotate clusters
for ii, dx in enumerate(np.diff(change_points)):
xx = change_points[ii] + dx/2
ax.text(xx, 0.5, str(clusters[int(xx)]), ha='center', va='center')
ax.set_xlabel('Time (days)')
plt.show()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |