'How to delete empty spaces from pandas DataFrame rows until first populated field?
Lets say I imported a really messy data from a PFD and I´m cleaning it. I have something like this:
Name | Type | Date | other1 | other2 | other3 |
---|---|---|---|---|---|
Name1 | '' | '' | Type1 | '' | Date1 |
Name2 | '' | '' | '' | Type2 | Date2 |
Name3 | '' | '' | Type3 | Date3 | '' |
Name4 | '' | Type4 | '' | '' | Date4 |
Name5 | Type5 | '' | Date5 | '' | '' |
And so on. As you can see, Type is always before date on each row, but I basically need to delete all '' (currently empty strings on the DataFrame) while moving everything to the left so they align with their respective Type and Date columns. Additionally, there's more columns to the right with the same problem, but for structural reasons I cant remove ALL '', the solution I´m looking for would just move 'everything to the left' so to speak (as it happens with pd.shift).
I appreciate your help.
Solution 1:[1]
What worked for me was:
while '' in df['Type'].unique():
for i,row in df.iterrows():
if row['Type'] == '':
df.iloc[i, 1:] = df.iloc[i, 1:].shift(-1, fill_value='')
And the same for next column
Solution 2:[2]
data = df.values.flatten()
pd.DataFrame(data[data != ""].reshape(-1, 3), columns = ['Name','Type', 'Date'])
or:
pd.DataFrame(df.values[df.values != ""].reshape(-1, 3), columns = ['Name','Type', 'Date'])
output:
Name Type Date
0 Name1 Type1 Date1
1 Name2 Type2 Date2
2 Name3 Type3 Date3
3 Name4 Type4 Date4
4 Name5 Type5 Date5
without reshape:
pd.DataFrame(df.apply(lambda x: (a:=np.array(x))[a != ""] , axis = 1).values.tolist())
or:
s = df[0].copy()
for col in df.columns[1:]:
s += " " + df[col]
pd.DataFrame(s.str.split().values.tolist(), columns = ['Name','Type', 'Date'])
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 |