'numpy where with multiple conditions linked to dataframe

I'm using numpy where with multiple conditions to assign a category based on a text string a transaction description.

Part of the code is below

`import numpy as np

conditions = [
    df2['description'].str.contains('AEGON', na=False),                        
    df2['description'].str.contains('IB/PVV', na=False),                   
    df2['description'].str.contains('Picnic', na=False),                   
    df2['description'].str.contains('Jumbo', na=False),  
]    
values = [
    'Hypotheek',                                                  
    'Hypotheek',                                                  
    'Boodschappen',                                               
    'Boodschappen']   

df2['Classificatie'] = np.select(conditions, values, default='unknown')
                                        

I have many conditions which - only partly shown here. I want to create a table / dataframe in stead of including every seperate condition and value in the code. So for instance the following dataframe:

import pandas as pd

Conditions = {'Condition': ['AEGON','IB/PVV','Picnic','Jumbo'],
         'Value': ['Hypotheek','Hypotheek','Boodschappen','Boodschappen']
         
        }

df_conditions = pd.DataFrame(Conditions, columns= ['Condition','Value'])

How can I adjust the condition to look for (in the str.contains) a text string as listed in df_condictions['condition'] and to apply the Value column to df2['Classificatie']? The values are already a list in the variable explorer, but I can't find a way to have the str.contains to look for a value in a list / dataframe.

    Desired outcome:
    In [3]: iwantthis
    Out[3]:
       Description               Classificatie
    0  groceries Jumbo on date   boodschappen         
1  mortgage payment Aegon.    Hypotheek
    2  transfer picnic.           Boodschappen

The first column is the input data frame, te second column is what I'm looking for.

Please note that my current code already allows me to create this column, but I want to use another more automated way using de df_condtions table.

I'm not yet really familiair with Python and I can't find anything online.



Solution 1:[1]

Try:

import re

df_conditions["Condition"] = df_conditions["Condition"].str.lower()
df_conditions = df_conditions.set_index("Condition")

tmp = df["Description"].str.extract(
    "(" + "|".join(re.escape(c) for c in df_conditions.index) + ")",
    flags=re.I,
)

df["Classificatie"] = tmp[0].str.lower().map(df_conditions["Value"])
print(df)

Prints:

               Description Classificatie
0  groceries Jumbo on date  Boodschappen
1  mortgage payment Aegon.     Hypotheek
2         transfer picnic.  Boodschappen

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Andrej Kesely