'I am having a problem in pandas python which i think might be due to wrong use of groupby
I have my dataset looking like this:
A B C CompanyName Sector year
4 9 3 d 10 2000
2 4 45 f 78 2001
7 53 55 y 99 2000
I want to have it looking like this
MeanA MeanB MeanC medianC Sector Year
bla bla bla bla bla bla
bla bla bla bla bla bla
bla bla bla bla bla bla
bla bla bla bla bla bla
So the first thing that came on my mind is to group by sector and year then use .agg() to calculate meanC medianC meanb meanA. But the problem is for meanC i noticed strange empty cells even though medianC exists so at least it should assume that value.
this is an example of code:
Data=Data.groupby(['Sector','year']).agg({'A':'mean', 'B':'mean', "C":['mean', 'median']})
I think I am using the groupby function in a wrong way, any help will be appreciated
PS. my dataset contains about 120k rows going from 2000 to 2015 with multiple companies
Solution 1:[1]
What are the dtype
of each column? Are A
and B
and C
all numeric, or can you convert them to int
or float
, or is your dataset dirty? If gropuby
works for A
and B
, likely data quality is an issue if it suddenly fails for C
.
As an aggregation function, you can directly call mean()
df.groupby['Sector', 'year'].mean()['C']
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | KingOtto |