'pandas dataframe replace blanks with NaN
I have a dataframe with empty cells and would like to replace these empty cells with NaN. A solution previously proposed at this forum works, but only if the cell contains a space:
df.replace(r'\s+',np.nan,regex=True)
This code does not work when the cell is empty. Has anyone a suggestion for a panda code to replace empty cells.
Solution 1:[1]
I think the easiest thing here is to do the replace twice:
In [117]:
df = pd.DataFrame({'a':['',' ','asasd']})
df
Out[117]:
a
0
1
2 asasd
In [118]:
df.replace(r'\s+',np.nan,regex=True).replace('',np.nan)
Out[118]:
a
0 NaN
1 NaN
2 asasd
Solution 2:[2]
Both other answers do not take in account all characters in a string. This is better:
df.replace(r'\s+( +\.)|#',np.nan,regex=True).replace('',np.nan))
More docs on: Replacing blank values (white space) with NaN in pandas
Solution 3:[3]
How about this?
df.replace(r'\s+|^$', np.nan, regex=True)
Solution 4:[4]
As you've already seen, if you do the obvious thing and replace() with None it throws an error:
df.replace('', None)
TypeError: cannot replace [''] with method pad on a DataFrame
The solution seems to be to simply replace the empty string with numpy's NaN.
import numpy as np
df.replace('', np.NaN)
While I'm not 100% sure that pd.NaN is treated in exactly the same way as np.NaN across all edge cases, I've not had any problems. fillna() works, persisting NULLs to database in place of np.NaN works, persisting NaN to csv works.
(Pandas version 18.1)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | EdChum |
Solution 2 | Community |
Solution 3 | UNagaswamy |
Solution 4 | deepgeek |