'Filter Pandas Dataframe based on List of substrings
I have a Pandas Dataframe containing multiple colums of strings. I now like to check a certain column against a list of allowed substrings and then get a new subset with the result.
substr = ['A', 'C', 'D']
df = pd.read_excel('output.xlsx')
df = df.dropna()
# now filter all rows where the string in the 2nd column doesn't contain one of the substrings
The only approach I found was creating a List of the corresponding column an then do a list comprehension, but then I loose the other columns. Can I use list comprehension as part of e.g. df.str.contains()
?
year type value price
2000 ty-A 500 10000
2002 ty-Q 200 84600
2003 ty-R 500 56000
2003 ty-B 500 18000
2006 ty-C 500 12500
2012 ty-A 500 65000
2018 ty-F 500 86000
2019 ty-D 500 51900
expected output:
year type value price
2000 ty-A 500 10000
2006 ty-C 500 12500
2012 ty-A 500 65000
2019 ty-D 500 51900
Solution 1:[1]
You could use pandas.Series.isin
>>> df.loc[df['type'].isin(substr)]
year type value price
0 2000 A 500 10000
4 2006 C 500 12500
5 2012 A 500 65000
7 2019 D 500 51900
Solution 2:[2]
you could use pandas.DataFrame.any or pandas.DataFrame.all
if you want where all instances match
df.loc[df['type'].apply(lambda x: all( word in x for word in substr)
or if you want any from the substr
df.loc[df['type'].apply(lambda x: any( word in x for word in substr)
That should if you print or return df a filtered list.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | help-ukraine-now |
Solution 2 | tylerjames |