'I am having a problem in pandas python which i think might be due to wrong use of groupby

I have my dataset looking like this:

 A    B    C    CompanyName   Sector    year
 4    9    3         d          10       2000 
 2    4    45        f          78       2001
 7   53    55        y          99       2000

I want to have it looking like this

 MeanA MeanB MeanC medianC   Sector  Year
 bla     bla   bla  bla        bla    bla
 bla     bla   bla  bla        bla    bla
 bla     bla   bla  bla        bla    bla
 bla     bla   bla  bla        bla    bla

So the first thing that came on my mind is to group by sector and year then use .agg() to calculate meanC medianC meanb meanA. But the problem is for meanC i noticed strange empty cells even though medianC exists so at least it should assume that value.

this is an example of code:

 Data=Data.groupby(['Sector','year']).agg({'A':'mean', 'B':'mean', "C":['mean', 'median']})

I think I am using the groupby function in a wrong way, any help will be appreciated

PS. my dataset contains about 120k rows going from 2000 to 2015 with multiple companies



Solution 1:[1]

What are the dtype of each column? Are A and B and C all numeric, or can you convert them to int or float, or is your dataset dirty? If gropuby works for A and B, likely data quality is an issue if it suddenly fails for C.

As an aggregation function, you can directly call mean()

df.groupby['Sector', 'year'].mean()['C']

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 KingOtto