'How to add custom annotations to a stacked bar
I am trying to annotate a stacked histogram in Seaborn with the hue for each segment in the histogram for readability reasons. I've attached sample data below and what I'm currently doing:
Sample data: https://easyupload.io/as5uxs
Current code to organize and display the plot:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# create the dataframe - from sample data file
data = {'brand': ['Audi', 'Audi', 'Audi', 'Audi', 'Audi', 'Audi', 'Audi', 'Audi', 'Audi', 'BMW', 'BMW', 'BMW', 'BMW', 'BMW', 'GM', 'GM', 'GM', 'GM', 'GM', 'GM', 'Toyota', 'Toyota'],
'Model': ['A3', 'A3', 'A3', 'A5', 'A5', 'RS5', 'RS5', 'RS5', 'RS5', 'M3', 'M3', 'M3', 'X1', 'X1', 'Chevy', 'Chevy', 'Chevy', 'Chevy', 'Caddy', 'Caddy', 'Camry', 'Corolla']}
data = pd.DataFrame(data)
# make the column categorical, using the order of the 'value_counts'
data['brand'] = pd.Categorical(data['brand'], data['brand'].value_counts(sort=True).index)
# We want to sort the hue value (model) alphabetically
hue_order = data['Model'].unique()
hue_order.sort()
f, ax = plt.subplots(figsize=(10, 6))
sns.histplot(data, x="brand", hue="Model", multiple="stack", edgecolor=".3", linewidth=.5, hue_order=hue_order, ax=ax)
This generates a nice plot with an ordered legend and ordered bars. However when I try annotate using a number of methods, I can't seem to get it to work. What I am after is the annotation to have the hue, and then the height of the bar (the number of vehicles with that manufacturer). So for example, for the first bar, I would want it to display RS5x 4 in the first grey shaded cell to demonstrate 4 vehicles of RS5 model, and so on for each segment of the stacked histogram.
I've tried a lot of methods and am struggling to get this to work. I've tried using:
for i, rect in enumerate(ax.patches):
# Find where everything is located
height = rect.get_height()
width = rect.get_width()
x = rect.get_x()
y = rect.get_y()
# The height of the bar is the count value and can used as the label
label_text = f'{height:.0f}'
label_x = x + width / 2
label_y = y + height / 2
# don't include label if it's equivalently 0
if height > 0.001:
ax.text(label_x, label_y, label_text, ha='center', va='center', fontsize=8)
Current Result
But this only displays the height of the bar, which is great, but I am not sure how to get the correct hue text to display along with that height.
Solution 1:[1]
- To create the desired annotation, it's necessary to know the order in which the bar sections are created, which is difficult since seaborn is doing a lot behind the scene. As such, it will be easier to plot a reshaped dataframe directly, because the column and row order is more explicit.
- This is just a count plot, not a histogram, therefore it's easier to reshape the dataframe with
pd.crosstab
, which will create a wide dataframe with'brand'
as the index,'Model'
as the columns, and the counts will be the values. - When the bar plot is created, all the values in each dataframe column are plotted in succession. Since we know the sequence of the columns, it's easy to extract the correct column name to add to the annotation. All of column
'A3'
is plotted, then'A5'
, etc. - Use
enumerate(ax.containers)
, and then usei
to indexcol
(e.g.col[i]
). There are 9BarContiners
, which correspond to each column. - This implementation won't work with
enumerate(ax.patches)
, because there are 36 patches.
- This is just a count plot, not a histogram, therefore it's easier to reshape the dataframe with
- seaborn is an api for matplotlib, and pandas uses matplotlib as the default plotting backend.
- matploblib >= 3.4.2 has
.bar_label
for annotations- See this answer for more information and examples.
- Tested in
python 3.10
,pandas 1.4.2
,matplotlib 3.5.1
Setup and Reshape
# create the dataframe
data = {'brand': ['Audi', 'Audi', 'Audi', 'Audi', 'Audi', 'Audi', 'Audi', 'Audi', 'Audi', 'BMW', 'BMW', 'BMW', 'BMW', 'BMW', 'GM', 'GM', 'GM', 'GM', 'GM', 'GM', 'Toyota', 'Toyota'],
'Model': ['A3', 'A3', 'A3', 'A5', 'A5', 'RS5', 'RS5', 'RS5', 'RS5', 'M3', 'M3', 'M3', 'X1', 'X1', 'Chevy', 'Chevy', 'Chevy', 'Chevy', 'Caddy', 'Caddy', 'Camry', 'Corolla']}
df = pd.DataFrame(data)
# sort brand by value counts
df['brand'] = pd.Categorical(df['brand'], df['brand'].value_counts(sort=True).index)
# reshape the dataframe and get count of each model per brand
ct = pd.crosstab(df.brand, df.Model)
# create a variable for the column names
cols = ct.columns
# display(ct)
Model A3 A5 Caddy Camry Chevy Corolla M3 RS5 X1
brand
Audi 3 2 0 0 0 0 0 4 0
GM 0 0 2 0 4 0 0 0 0
BMW 0 0 0 0 0 0 3 0 2
Toyota 0 0 0 1 0 1 0 0 0
Plot and Annotate
# plot the dataframe, which uses matplotlib as the backend (seaborn is just an api for matplotlib)
ax = ct.plot(kind='bar', stacked=True, width=1, ec='k', figsize=(10, 6), rot=0)
# iterate through each container and add custom annotations
for i, c in enumerate(ax.containers):
# customize the label to account for cases when there might not be a bar section - with assignment expression (h := ...)
labels = [f'{cols[i]}: {h:0.0f}' if (h := v.get_height()) > 0 else '' for v in c ]
# without assignment expression v.get_height() must be called twice
# labels = [f'{cols[i]}: {v.get_height():0.0f}' if v.get_height() > 0 else '' for v in c ]
# set the bar label
ax.bar_label(c, labels=labels, label_type='center', fontsize=8)
plt.show()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |