'Plot distribution of pandas dataframe depending on target value
I want to visualize the grade
depending on the sex
(male/female).
My dataframe:
df = pd.DataFrame(
{
"key": ["K0", "K1", "K2", "K3", "K4", "K5", "K6", "K7", "K8", "K9"],
"grade": [1.0, 2.0, 4.0, 1.0, 5.0, 2.0, 3.0, 1.0, 6.0, 3.0],
"sex": [1, 0, 0, 1, 0,1,0,1,0,0]
}
)
key grade sex
0 K0 1.0 1
1 K1 2.0 0
2 K2 4.0 0
3 K3 1.0 1
4 K4 5.0 0
5 K5 2.0 1
6 K6 3.0 0
7 K7 1.0 1
8 K8 6.0 0
9 K9 3.0 0
My approach was to use a histogram and plot the distribution. However, I don't know how to visualize the distribution depending on the target. There are some examples in Seaborn Documentation, but I failed to apply it to my specific problem.
All I have is this:
plt.hist(df['grade'], bins=10, edgecolor='black');
plt.xlabel('grade');
plt.ylabel('count');
Solution 1:[1]
You can do this in matplotlib:
import matplotlib.pyplot as pyplot
x=df.loc[df['sex']==1, 'grade']
y=df.loc[df['sex']==0, 'grade']
bins=list(range(6))
pyplot.hist(x, bins, alpha=0.5, label='sex=1')
pyplot.hist(y, bins, alpha=0.5, label='sex=2')
pyplot.legend(loc='upper right')
pyplot.show()
Solution 2:[2]
There is also a way for doing this with pandas:
df[df['sex'] == 0]['grade'].plot.hist()
df[df['sex'] == 1]['grade'].plot.hist()
and you can also have smooth curve with using kde():
df[df['sex'] == 0]['grade'].plot.kde()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Mahsa Yazdani |