'Pandas- DataError: No numeric types to aggregate

I have a DataFrame with 5 columns, where the column i need to aggregate is of a string, and has NaN values. I tried replacing the nan values with 0 and then converting the column to numeric but still getting "DataError: No numeric types to aggregate" when performing a gruopby and aggregate function.

df.head()

df['Profit (in millions)']= df['Profit(in millions)'].str.replace('N.A', '0')
pd.to_numeric(df['Profit (in millions)'], errors ='ignore')
df_new = df.groupby('Year')['Profit (in millions)'].median()
df_new.head(7)

enter image description here

error: enter image description here



Solution 1:[1]

There are 2 problems, you forget assign back and if errors ='ignore' and at least one non paseable value in column it return column with no change:

pd.to_numeric(df['Profit (in millions)'], errors ='ignore')

to_numeric:

errors : {'ignore', 'raise', 'coerce'}, default 'raise'

If 'raise', then invalid parsing will raise an exception
If 'coerce', then invalid parsing will be set as NaN
If 'ignore', then invalid parsing will return the input

So use errors ='coerce' and return output back to column:

df['Profit (in millions)'] = pd.to_numeric(df['Profit (in millions)'], errors ='coerce')

Solution 2:[2]

Using pandas str.replace only works with strings and will replace non-strings with NAN which is why you are getting that error.

An alternative method to replace strings with numbers is to use map and a lambda function. You do this before any data aggregation:

import pandas as pd

df = pd.DataFrame()

df['Price'] = [1,2,3,'N.A.']
df['Year']  = [2001, 2001, 2002, 2002]
print(df)

df['Price'] = df.Price.map(lambda x: 0.0 if type(x) is str else x)
print(df)

df_new = df.groupby('Year')['Price'].median()
print(df_new)

enter image description here

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Community
Solution 2 screamingGoose