'Connecting means in seaborn box plot
I want to connect box plot means. I can do the basic part but cannot connect box plot means and box plots offset from x axis. similar post but not connecting means Python: seaborn pointplot and boxplot in one plot but shifted on the x-axis
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
raw_data = {'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy','Jason', 'Molly', 'Tina', 'Jake', 'Amy','Jason', 'Molly', 'Tina', 'Jake', 'Amy'],
'pre_score': [4, 24, 31, 2, 3,25, 94, 57, 62, 70,5, 43, 23, 23, 51]
}
data = pd.DataFrame(raw_data, columns = ['first_name', 'pre_score'])
first_name pre_score
0 Jason 4
1 Molly 24
2 Tina 31
3 Jake 2
4 Amy 3
5 Jason 25
6 Molly 94
7 Tina 57
8 Jake 62
9 Amy 70
10 Jason 5
11 Molly 43
12 Tina 23
13 Jake 23
14 Amy 51
sns.set_style("ticks")
ax = sns.stripplot(x='first_name', y='pre_score', hue='first_name', jitter=True, dodge=True, size=6, zorder=0, alpha=0.5, linewidth =1, data=data)
ax = sns.boxplot(x='first_name', y='pre_score', hue='first_name', dodge=True, showfliers=True, linewidth=0.8, showmeans=True, data=data)
ax = sns.lineplot(x='first_name', y='pre_score', color='k', data=data.groupby(['first_name'], as_index=False).mean())
fig_size = [18.0, 10.0]
plt.rcParams["figure.figsize"] = fig_size
handles, labels = ax.get_legend_handles_labels()
legend_len = labels.__len__()
ax.legend(handles[int(legend_len/2):legend_len], labels[int(legend_len/2):legend_len], bbox_to_anchor=(1.01, 1), loc=2, borderaxespad=0.1);
As we can see the sns.line plot does not follow the means and box plots and names in the x axis has offset.
How can I fix this ?
Solution 1:[1]
When dealing with seaborn plot, I would strongly recommend you always provide an order=
(and hue_order=
if applicable) to avoid nasty surprise with the categories not showing up in a consistent order between calls.
For the purpose of your question, you can replace the lineplot
with a pointplot
, which will automatically aggregate the values by categories and plot using a line
raw_data = {'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy','Jason', 'Molly', 'Tina', 'Jake', 'Amy','Jason', 'Molly', 'Tina', 'Jake', 'Amy'],
'pre_score': [4, 24, 31, 2, 3,25, 94, 57, 62, 70,5, 43, 23, 23, 51]
}
data = pd.DataFrame(raw_data, columns = ['first_name', 'pre_score'])
# define the order in which the categories will be plotted on the x-axis
order = np.sort(data['first_name'].unique()) # you could also create a list by hand if you want a specific order
sns.set_style("ticks")
ax = sns.stripplot(x='first_name', y='pre_score', order=order, jitter=True, size=6, zorder=0, alpha=0.5, linewidth =1, data=data)
ax = sns.boxplot(x='first_name', y='pre_score', order=order, showfliers=True, linewidth=0.8, showmeans=True, data=data)
ax = sns.pointplot(x='first_name', y='pre_score', order=order, data=data, ci=None, color='black')
If for some reason you don't want to or cannot use a seaborn function that takes an order
argument, then aggregate by hand in pandas, and reindex()
with your order to make sure the values appear in the right order in the dataframe before plotting with the tool of your choice.
For instance, you could replace the call to pointplot()
above with:
means = data.groupby('first_name')['pre_score'].mean().reindex(order) # calculate the means and ensure they are
# displayed in the same order as the boxplots
ax.plot(means.index, means.values, 'ko-', lw=3)
and have the exact same result
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Diziet Asahi |