'Filter Pandas Dataframe based on List of substrings

I have a Pandas Dataframe containing multiple colums of strings. I now like to check a certain column against a list of allowed substrings and then get a new subset with the result.

substr = ['A', 'C', 'D']
df = pd.read_excel('output.xlsx')
df = df.dropna()
# now filter all rows where the string in the 2nd column doesn't contain one of the substrings

The only approach I found was creating a List of the corresponding column an then do a list comprehension, but then I loose the other columns. Can I use list comprehension as part of e.g. df.str.contains()?

year  type     value   price
2000  ty-A     500     10000
2002  ty-Q     200     84600
2003  ty-R     500     56000
2003  ty-B     500     18000
2006  ty-C     500     12500
2012  ty-A     500     65000
2018  ty-F     500     86000
2019  ty-D     500     51900

expected output:

year  type     value   price
2000  ty-A     500     10000
2006  ty-C     500     12500
2012  ty-A     500     65000
2019  ty-D     500     51900


Solution 1:[1]

You could use pandas.Series.isin

>>> df.loc[df['type'].isin(substr)]
   year type  value  price
0  2000    A    500  10000
4  2006    C    500  12500
5  2012    A    500  65000
7  2019    D    500  51900

Solution 2:[2]

you could use pandas.DataFrame.any or pandas.DataFrame.all

if you want where all instances match

df.loc[df['type'].apply(lambda x: all( word in x for word in substr)

or if you want any from the substr

df.loc[df['type'].apply(lambda x: any( word in x for word in substr)

That should if you print or return df a filtered list.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 help-ukraine-now
Solution 2 tylerjames