'Matplotlib / Seaborn violin plots for different data sizes
I have 3 one-dimensional data arrays A, B, C. All of them have different length.
I would like to make a violin plot with 3 violins, one per each array. How do I do this?
EDIT: I have solved the problem by writing a proxy function, but having to convert the labels into column for every array feels wasteful. Is it possible to do it nicer/more efficiently
def dict2pandas(d, keyname, valname):
dframes = []
for k,v in d.items():
dframes += [pd.DataFrame({keyname : [k] * len(v), valname : v})]
return pd.concat(dframes)
data = {
'A' : np.random.normal(1, 1, 100),
'B' : np.random.normal(2, 1, 110),
'C' : np.random.normal(3, 1, 120)
}
dataDF = dict2pandas(data, 'arrays', 'values')
fig, ax = plt.subplots()
sns.violinplot(data=dataDF, x='arrays', y='values', scale='width', axis=ax)
plt.show()
Solution 1:[1]
I too could find no better idea than filling the Pandas DataFrame with NaN
s, but this approach is perhaps a little tidier:
import numpy as np
import pandas as pd
import seaborn as sns
# OP's data
data = {
'A' : np.random.normal(1, 1, 100),
'B' : np.random.normal(2, 1, 110),
'C' : np.random.normal(3, 1, 120)
}
# Create DataFrame where NaNs fill shorter arrays
df = pd.DataFrame([data['A'], data['B'], data['C']]).transpose()
# Label the columns of the DataFrame
df = df.set_axis(['A','B','C'], axis=1)
# Violin plot
sns.violinplot(data=df)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Matthew Walker |