'How to remove NaN from the columns
I have a dataframe with NaN. I have to remove nan at the starting rows only, and wants to keeps NaN after real number starts:
Suppose my data frame is something like:
a = pd.DataFrame({'data':[np.nan,np.nan,np.nan,np.nan,4,5,6,2,np.nan,1,3,4,5,np.nan,4,5,np.nan,np.nan]})
a=
data
0 NaN
1 NaN
2 NaN
3 NaN
4 4.0
5 5.0
6 6.0
7 2.0
8 NaN
9 1.0
10 3.0
11 4.0
12 5.0
13 NaN
14 4.0
15 5.0
16 NaN
17 NaN
and I tried to remove NaN at the beginning and wants data-frame like this:
data
4 4.0
5 5.0
6 6.0
7 2.0
8 NaN
9 1.0
10 3.0
11 4.0
12 5.0
13 NaN
14 4.0
15 5.0
16 NaN
17 NaN
I tried to use this function but it is not working.
for w in np.arange(len(a)):
if a.iloc[w] == np.nan:
a.drop(a.index[w])
Solution 1:[1]
Get the first valid index and slice
idx = a.first_valid_index()
a.loc[idx:]
data
4 4.0
5 5.0
6 6.0
7 2.0
8 NaN
9 1.0
10 3.0
11 4.0
12 5.0
13 NaN
14 4.0
15 5.0
16 NaN
17 NaN
Solution 2:[2]
try something like this:
start = a[a.data.notnull()].index[0]
new_df = a.loc[start:]
the first line finds the index of the first non-null value, the second cuts out all the entries before that from your dataframe.
Solution 3:[3]
Instead of removing the "bad" rows, you can locate and preserve the "good" rows:
b = a[a.data.fillna(method='ffill').notnull()]
# data
#4 4.0
#5 5.0
#6 6.0
#7 2.0
#8 NaN
#9 1.0
Solution 4:[4]
Ummm , you should using first_valid_index()
, but here is another way :-)
a.loc[a.data.notnull().nonzero()[0][0]:]
Out[1276]:
data
4 4.0
5 5.0
6 6.0
7 2.0
8 NaN
9 1.0
10 3.0
11 4.0
12 5.0
13 NaN
14 4.0
15 5.0
16 NaN
17 NaN
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Vaishali |
Solution 2 | sacuL |
Solution 3 | DYZ |
Solution 4 | BENY |