'Pandas- DataError: No numeric types to aggregate
I have a DataFrame with 5 columns, where the column i need to aggregate is of a string, and has NaN values. I tried replacing the nan values with 0 and then converting the column to numeric but still getting "DataError: No numeric types to aggregate" when performing a gruopby and aggregate function.
df.head()
df['Profit (in millions)']= df['Profit(in millions)'].str.replace('N.A', '0')
pd.to_numeric(df['Profit (in millions)'], errors ='ignore')
df_new = df.groupby('Year')['Profit (in millions)'].median()
df_new.head(7)
Solution 1:[1]
There are 2 problems, you forget assign back and if errors ='ignore'
and at least one non paseable value in column it return column with no change:
pd.to_numeric(df['Profit (in millions)'], errors ='ignore')
errors : {'ignore', 'raise', 'coerce'}, default 'raise'
If 'raise', then invalid parsing will raise an exception
If 'coerce', then invalid parsing will be set as NaN
If 'ignore', then invalid parsing will return the input
So use errors ='coerce'
and return output back to column:
df['Profit (in millions)'] = pd.to_numeric(df['Profit (in millions)'], errors ='coerce')
Solution 2:[2]
Using pandas str.replace
only works with strings and will replace non-strings with NAN which is why you are getting that error.
An alternative method to replace strings with numbers is to use map
and a lambda function. You do this before any data aggregation:
import pandas as pd
df = pd.DataFrame()
df['Price'] = [1,2,3,'N.A.']
df['Year'] = [2001, 2001, 2002, 2002]
print(df)
df['Price'] = df.Price.map(lambda x: 0.0 if type(x) is str else x)
print(df)
df_new = df.groupby('Year')['Price'].median()
print(df_new)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Community |
Solution 2 | screamingGoose |