'numpy where with multiple conditions linked to dataframe
I'm using numpy where with multiple conditions to assign a category based on a text string a transaction description.
Part of the code is below
`import numpy as np
conditions = [
df2['description'].str.contains('AEGON', na=False),
df2['description'].str.contains('IB/PVV', na=False),
df2['description'].str.contains('Picnic', na=False),
df2['description'].str.contains('Jumbo', na=False),
]
values = [
'Hypotheek',
'Hypotheek',
'Boodschappen',
'Boodschappen']
df2['Classificatie'] = np.select(conditions, values, default='unknown')
I have many conditions which - only partly shown here. I want to create a table / dataframe in stead of including every seperate condition and value in the code. So for instance the following dataframe:
import pandas as pd
Conditions = {'Condition': ['AEGON','IB/PVV','Picnic','Jumbo'],
'Value': ['Hypotheek','Hypotheek','Boodschappen','Boodschappen']
}
df_conditions = pd.DataFrame(Conditions, columns= ['Condition','Value'])
How can I adjust the condition to look for (in the str.contains) a text string as listed in df_condictions['condition'] and to apply the Value column to df2['Classificatie']? The values are already a list in the variable explorer, but I can't find a way to have the str.contains to look for a value in a list / dataframe.
Desired outcome:
In [3]: iwantthis
Out[3]:
Description Classificatie
0 groceries Jumbo on date boodschappen
1 mortgage payment Aegon. Hypotheek
2 transfer picnic. Boodschappen
The first column is the input data frame, te second column is what I'm looking for.
Please note that my current code already allows me to create this column, but I want to use another more automated way using de df_condtions table.
I'm not yet really familiair with Python and I can't find anything online.
Solution 1:[1]
Try:
import re
df_conditions["Condition"] = df_conditions["Condition"].str.lower()
df_conditions = df_conditions.set_index("Condition")
tmp = df["Description"].str.extract(
"(" + "|".join(re.escape(c) for c in df_conditions.index) + ")",
flags=re.I,
)
df["Classificatie"] = tmp[0].str.lower().map(df_conditions["Value"])
print(df)
Prints:
Description Classificatie
0 groceries Jumbo on date Boodschappen
1 mortgage payment Aegon. Hypotheek
2 transfer picnic. Boodschappen
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Andrej Kesely |