'Pandas: how to filter out rows containing a string pattern within a list in a column?
I have a data frame that looks similar to the following:
df = pd.DataFrame({
'employee_id' : [123, 456, 789],
'country_code' : ['US', 'CAN', 'MEX'],
'comments' : (['good performer', 'due for raise', 'should be promoted'],
['bad performer', 'should be fired', 'speak to HR'],
['recently hired', 'needs training', 'shows promise'])
})
df
employee_id country_code comments
0 123 US [good performer, due for raise, should be promoted]
1 456 CAN [bad performer, should be fired, speak to HR]
2 789 MEX [recently hired, needs training, shows promise]
I would like to be able to filter the comments
column to remove any rows containing the string 'performer'. To do so, I'm using:
df = df[~df['comments'].str.contains('performer')]
But, this returns an error:
TypeError: ufunc 'invert' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
Thanks in advance for any assistance you can give!
Solution 1:[1]
if IIUC You need to break the comments column down into a string instead of a list
df = pd.DataFrame({
'employee_id' : [123, 456, 789],
'country_code' : ['US', 'CAN', 'MEX'],
'comments' : (['good performer', 'due for raise', 'should be promoted'],
['bad performer', 'should be fired', 'speak to HR'],
['recently hired', 'needs training', 'shows promise'])
})
df['comments'] = df['comments'].apply(lambda x : ' '.join(x))
df = df[~df['comments'].str.contains('performer')]
df
Solution 2:[2]
As you have lists in your Series, you cannot vectorize. You can use a list comprehension:
df2 = df[[all('performer' not in x for x in l)
for l in df['comments']]]
Output:
employee_id country_code comments
2 789 MEX [recently hired, needs training, shows promise]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | ArchAngelPwn |
Solution 2 | mozway |