'How to create and annotate a stacked proportional bar chart

I'm struggling to create a stacked bar chart derived from value_counts() of a columns from a dataframe.

Assume a dataframe like the following, where responder is not important, but would like to stack the count of [1,2,3,4,5] for all q# columns.

responder, q1, q2, q3, q4, q5
------------------------------
r1, 5, 3, 2, 4, 1
r2, 3, 5, 1, 4, 2
r3, 2, 1, 3, 4, 5
r4, 1, 4, 5, 3, 2
r5, 1, 2, 5, 3, 4
r6, 2, 3, 4, 5, 1
r7, 4, 3, 2, 1, 5

Look something like, except each bar would be labled by q# and it would include 5 sections for count of [1,2,3,4,5] from the data:

enter image description here

Ideally, all bars will be "100%" wide, showing the count as a proportion of the bar. But it's gauranteed that each responder row will have one entry for each, so the percentage is just a bonus if possible.

Any help would be much appreciated, with a slight preference for matplotlib solution.



Solution 1:[1]

You can calculate the heights of bars using percentages and obtain the stacked bar plot using ax = percents.T.plot(kind='barh', stacked=True) where percents is a DataFrame with q1,...q5 as columns and 1,...,5 as indices.

>>> percents
         q1        q2        q3        q4        q5
1  0.196873  0.199316  0.206644  0.194919  0.202247
2  0.205357  0.188988  0.205357  0.205357  0.194940
3  0.202265  0.217705  0.184766  0.196089  0.199177
4  0.199494  0.199494  0.190886  0.198481  0.211646
5  0.196137  0.195146  0.211491  0.205052  0.192174

Then you can use ax.patches to add labels for every bar. Labels can be generated from the original counts DataFrame: counts = df.apply(lambda x: x.value_counts())

>>> counts
    q1   q2   q3   q4   q5
1  403  408  423  399  414
2  414  381  414  414  393
3  393  423  359  381  387
4  394  394  377  392  418
5  396  394  427  414  388

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## create some data similar to yours
np.random.seed(42)
categories = ['q1','q2','q3','q4','q5']
df = pd.DataFrame(np.random.randint(1,6,size=(2000, 5)), columns=categories)

## counts will be used for the labels
counts = df.apply(lambda x: x.value_counts())

## percents will be used to determine the height of each bar
percents = counts.div(counts.sum(axis=1), axis=0)

counts_array = counts.values
nrows, ncols = counts_array.shape
indices = [(i,j) for i in range(0,nrows) for j in range(0,ncols)]
percents_array = percents.values

ax = percents.T.plot(kind='barh', stacked=True)
ax.legend(bbox_to_anchor=(1, 1.01), loc='upper right')
for i, p in enumerate(ax.patches):
    ax.annotate(f"({p.get_width():.2f}%)", (p.get_x() + p.get_width() - 0.15, p.get_y() - 0.10), xytext=(5, 10), textcoords='offset points')
    ax.annotate(str(counts_array[indices[i]]), (p.get_x() + p.get_width() - 0.15, p.get_y() + 0.10), xytext=(5, 10), textcoords='offset points')
plt.show()

enter image description here

Solution 2:[2]

  • From matplotlib 3.4.2, use matplotlib.pyplot.bar_label.
  • pro = df.div(df.sum(axis=1), axis=0) creates a dataframe of proportions relative to each row. Note the importance of summing and dividing along the correct axis.
  • Use pandas.DataFrame.plot with kind='barh' and stacked=True to plot the pro dataframe, which will create an x-axis with the correct range (0 - 1). matplotlib is the default plotting backend.
  • .bar_label has a labels parameter, which accepts custom labels.
    • labels is created with a list comprehension, where the values (vals) from df are combined with the values from per for each bar patch.
    • (w := v.get_width()) > 0 can be used to conditionally show annotations, greater than 0 in this case. := is an assignment expression, available from python 3.8.
      • labels = [f'{val}\n({w.get_width()*100:.1f}%)' for w, val in zip(c, vals)] can be used if there's no need to check the patch size.
      • labels = [f'{val}\n({w.get_width()*100:.1f}%)' if w.get_width() > 0 else '' for w, val in zip(c, vals)] works without the :=, but requires using .get_width() twice.
  • Tested in python 3.10, pandas 1.3.5, matplotlib 3.5.1, seaborn 0.11.2
import pandas as pd

# sample dataframe from OP
data = {'responder': ['r1', 'r2', 'r3', 'r4', 'r5', 'r6', 'r7'], 'q1': [5, 3, 2, 1, 1, 2, 4], 'q2': [3, 5, 1, 4, 2, 3, 3], 'q3': [2, 1, 3, 5, 5, 4, 2], 'q4': [4, 4, 4, 3, 3, 5, 1], 'q5': [1, 2, 5, 2, 4, 1, 5]}

# The labels to be on the y-axis should be set as the index
# If the column names and index need to be swapped, use .T to transpose the dataframe
df = pd.DataFrame(data).set_index('responder')

# create dataframe with proportions
pro = df.div(df.sum(axis=1), axis=0)

# plot
ax = pro.plot(kind='barh', figsize=(12, 10), stacked=True)

# move legend
ax.legend(bbox_to_anchor=(1, 1.01), loc='upper left')

# column names from per used to get the column values from df
cols = pro.columns

# iterate through each group of containers and the corresponding column name
for c, col in zip(ax.containers, cols):
    
    # get the values for the column from df
    vals = df[col]

    # create a custom label for bar_label
    labels = [f'{val}\n({w*100:.1f}%)' if (w := v.get_width()) > 0 else '' for v, val in zip(c, vals)]
    
    # annotate each section with the custom labels
    ax.bar_label(c, labels=labels, label_type='center', fontweight='bold')

enter image description here

  • Transposing df with df = pd.DataFrame(data).set_index('responder').T, swaps the index and columns, to produce the following plot. figsize=(12, 10) may need to be adjusted.

enter image description here

DataFrames

  • df
           q1  q2  q3  q4  q5
responder                    
r1          5   3   2   4   1
r2          3   5   1   4   2
r3          2   1   3   4   5
r4          1   4   5   3   2
r5          1   2   5   3   4
r6          2   3   4   5   1
r7          4   3   2   1   5
  • per
                 q1        q2        q3        q4        q5
responder                                                  
r1         0.333333  0.200000  0.133333  0.266667  0.066667
r2         0.200000  0.333333  0.066667  0.266667  0.133333
r3         0.133333  0.066667  0.200000  0.266667  0.333333
r4         0.066667  0.266667  0.333333  0.200000  0.133333
r5         0.066667  0.133333  0.333333  0.200000  0.266667
r6         0.133333  0.200000  0.266667  0.333333  0.066667
r7         0.266667  0.200000  0.133333  0.066667  0.333333

Referenced

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2