'Splitting and grouping pandas into intervals and calculating mean based on different column

I have a well-known Titanic dataset and I am trying to find the survival probability of a person, based on their age and sex. The input I am given is the number of intervals the dataset is gonna be split into (it's going to be split based on Age), age, and sex. Also, some data for Age is missing, so I should fill it with the mean value of other Age records.

The created dataset should look someway like this.

"AgeInterval"	"Sex"	"Survival Probability"
(1.977, 13.5]	"male"	0.21
(1.977, 13.5]	"female"	0.28
(13.5, 25.0]	"male"	0.10
(13.5, 25.0]	"female"	0.15

From this, I have to find the probability based on age and sex.

So far I've tried:

df = df.fillna(df["Age"].mean())

to fill the values

df["AgeInterval"] = pd.cut(df.Age, bins=n_interval, right=True)

to create the intervals

df = df.groupby(['AgeInterval', 'Sex'])

to group the intervals along with sex,

df = df.agg({'Survived' : 'mean'})

to calculate mean of Survived

Although this is giving me some results, the results are wrong and I can't find the right solution for this problem.

Another thing is getting the value. To which I tried with the following:

result = df.loc[(df["AgeInterval"]==age)&(df["Sex"]==sex)]

But this only raises KeyError. I don't know why, because when I print df, I can see AgeInterval and Sex.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Splitting and grouping pandas into intervals and calculating mean based on different column

Sources

Related Questions