'How to create and annotate a stacked proportional bar chart
I'm struggling to create a stacked bar chart derived from value_counts()
of a columns from a dataframe.
Assume a dataframe like the following, where responder
is not important, but would like to stack the count of [1,2,3,4,5]
for all q#
columns.
responder, q1, q2, q3, q4, q5
------------------------------
r1, 5, 3, 2, 4, 1
r2, 3, 5, 1, 4, 2
r3, 2, 1, 3, 4, 5
r4, 1, 4, 5, 3, 2
r5, 1, 2, 5, 3, 4
r6, 2, 3, 4, 5, 1
r7, 4, 3, 2, 1, 5
Look something like, except each bar would be labled by q#
and it would include 5 sections for count of [1,2,3,4,5]
from the data:
Ideally, all bars will be "100%" wide, showing the count as a proportion of the bar. But it's gauranteed that each responder
row will have one entry for each, so the percentage is just a bonus if possible.
Any help would be much appreciated, with a slight preference for matplotlib
solution.
Solution 1:[1]
You can calculate the heights of bars using percentages and obtain the stacked bar plot using ax = percents.T.plot(kind='barh', stacked=True)
where percents
is a DataFrame with q1,...q5
as columns and 1,...,5
as indices.
>>> percents
q1 q2 q3 q4 q5
1 0.196873 0.199316 0.206644 0.194919 0.202247
2 0.205357 0.188988 0.205357 0.205357 0.194940
3 0.202265 0.217705 0.184766 0.196089 0.199177
4 0.199494 0.199494 0.190886 0.198481 0.211646
5 0.196137 0.195146 0.211491 0.205052 0.192174
Then you can use ax.patches
to add labels for every bar. Labels can be generated from the original counts DataFrame: counts = df.apply(lambda x: x.value_counts())
>>> counts
q1 q2 q3 q4 q5
1 403 408 423 399 414
2 414 381 414 414 393
3 393 423 359 381 387
4 394 394 377 392 418
5 396 394 427 414 388
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
## create some data similar to yours
np.random.seed(42)
categories = ['q1','q2','q3','q4','q5']
df = pd.DataFrame(np.random.randint(1,6,size=(2000, 5)), columns=categories)
## counts will be used for the labels
counts = df.apply(lambda x: x.value_counts())
## percents will be used to determine the height of each bar
percents = counts.div(counts.sum(axis=1), axis=0)
counts_array = counts.values
nrows, ncols = counts_array.shape
indices = [(i,j) for i in range(0,nrows) for j in range(0,ncols)]
percents_array = percents.values
ax = percents.T.plot(kind='barh', stacked=True)
ax.legend(bbox_to_anchor=(1, 1.01), loc='upper right')
for i, p in enumerate(ax.patches):
ax.annotate(f"({p.get_width():.2f}%)", (p.get_x() + p.get_width() - 0.15, p.get_y() - 0.10), xytext=(5, 10), textcoords='offset points')
ax.annotate(str(counts_array[indices[i]]), (p.get_x() + p.get_width() - 0.15, p.get_y() + 0.10), xytext=(5, 10), textcoords='offset points')
plt.show()
Solution 2:[2]
- From
matplotlib 3.4.2
, usematplotlib.pyplot.bar_label
. pro = df.div(df.sum(axis=1), axis=0)
creates a dataframe of proportions relative to each row. Note the importance of summing and dividing along the correct axis.- Use
pandas.DataFrame.plot
withkind='barh'
andstacked=True
to plot thepro
dataframe, which will create an x-axis with the correct range (0 - 1).matplotlib
is the default plotting backend. .bar_label
has alabels
parameter, which accepts custom labels.labels
is created with a list comprehension, where the values (vals
) fromdf
are combined with the values fromper
for each bar patch.(w := v.get_width()) > 0
can be used to conditionally show annotations, greater than 0 in this case.:=
is an assignment expression, available frompython 3.8
.labels = [f'{val}\n({w.get_width()*100:.1f}%)' for w, val in zip(c, vals)]
can be used if there's no need to check the patch size.labels = [f'{val}\n({w.get_width()*100:.1f}%)' if w.get_width() > 0 else '' for w, val in zip(c, vals)]
works without the:=
, but requires using.get_width()
twice.
- Tested in
python 3.10
,pandas 1.3.5
,matplotlib 3.5.1
,seaborn 0.11.2
import pandas as pd
# sample dataframe from OP
data = {'responder': ['r1', 'r2', 'r3', 'r4', 'r5', 'r6', 'r7'], 'q1': [5, 3, 2, 1, 1, 2, 4], 'q2': [3, 5, 1, 4, 2, 3, 3], 'q3': [2, 1, 3, 5, 5, 4, 2], 'q4': [4, 4, 4, 3, 3, 5, 1], 'q5': [1, 2, 5, 2, 4, 1, 5]}
# The labels to be on the y-axis should be set as the index
# If the column names and index need to be swapped, use .T to transpose the dataframe
df = pd.DataFrame(data).set_index('responder')
# create dataframe with proportions
pro = df.div(df.sum(axis=1), axis=0)
# plot
ax = pro.plot(kind='barh', figsize=(12, 10), stacked=True)
# move legend
ax.legend(bbox_to_anchor=(1, 1.01), loc='upper left')
# column names from per used to get the column values from df
cols = pro.columns
# iterate through each group of containers and the corresponding column name
for c, col in zip(ax.containers, cols):
# get the values for the column from df
vals = df[col]
# create a custom label for bar_label
labels = [f'{val}\n({w*100:.1f}%)' if (w := v.get_width()) > 0 else '' for v, val in zip(c, vals)]
# annotate each section with the custom labels
ax.bar_label(c, labels=labels, label_type='center', fontweight='bold')
- Transposing
df
withdf = pd.DataFrame(data).set_index('responder').T
, swaps the index and columns, to produce the following plot.figsize=(12, 10)
may need to be adjusted.
DataFrames
df
q1 q2 q3 q4 q5
responder
r1 5 3 2 4 1
r2 3 5 1 4 2
r3 2 1 3 4 5
r4 1 4 5 3 2
r5 1 2 5 3 4
r6 2 3 4 5 1
r7 4 3 2 1 5
per
q1 q2 q3 q4 q5
responder
r1 0.333333 0.200000 0.133333 0.266667 0.066667
r2 0.200000 0.333333 0.066667 0.266667 0.133333
r3 0.133333 0.066667 0.200000 0.266667 0.333333
r4 0.066667 0.266667 0.333333 0.200000 0.133333
r5 0.066667 0.133333 0.333333 0.200000 0.266667
r6 0.133333 0.200000 0.266667 0.333333 0.066667
r7 0.266667 0.200000 0.133333 0.066667 0.333333
Referenced
- How to put the legend out of the plot shows various ways to format and move the legend.
- Adding value labels on a matplotlib bar chart provides a detailed explanation of
.bar_label
. - How to add multiple annotations to a barplot
- stack bar plot in matplotlib and add label to each section
- How to annotate barplot with percent by hue/legend group
- How to add percentages on top of bars in seaborn
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 |