'Dynamic bin per row in a dataset in Pandas
I am having trouble dynamically binning my dataset for further calculation. My goal is to have specific bin/labels for each individual row in my dataframe, based on a function, and have the corresponding label assign to the column 'action'.
My dataset is:
id value1 value2 type length amount
1 0.9 1.0 X 10 ['A', 'B']
2 2.0 1.6 Y 80 ['A']
3 0.3 0.5 X 29 ['A', 'C']
The function is as follows:
def bin_label_generator(amount):
if amount< 2:
amount= 2
lower_bound = 1.0 - (1.0/amount)
mid_bound = 1.0
upper_bound = 1.0 + (1.0/amount)
thresholds = {
'bins':[-np.inf, lower_bound, mid_bound, upper_bound, np.inf],
'labels':[0, 1.0, 2.0, 3.0]
}
return thresholds
This is my current code, but it requires me to specify a row in order to cut. I would want this to happen automatically with the dictionary specified in the row itself.
# filter on type
filter_type_series = df['type'].str.contains('X')
# get amount of items in amount list
amount_series = df[filter_type_series ]['amount'].str.len()
# generate bins for each row in series
bins_series = amount_series.apply(bin_label_generator)
# get the max values to for binning
max_values = df[filter_type_series].loc[:, [value1, value2]].abs().max(1)
# following line requires a row index, what I do not want
df['action'] = pd.cut(max_values, bins=bins_series[0]['bins'], labels=bins_series[0]['labels'])
Solution 1:[1]
Found a fix myself, by just iterating over every single row in the series, and then adding it towards the columns in the actual df.
type = 'X'
first_df = df.copy()
type_series = mst_df['type'].str.contains(type)
# loop over every row to dynamically use pd.cut with bins/labels from specific row
for index, row in mst_df[mst_series].iterrows():
# get the max value from rows
max_val = row[[value1, value2]].abs().max()
# get amount of cables
amount = len(row['amount'])
# get bins and labels for specific row
bins_label_dict = bin_label_generator(amount)
bins = bins_label_dict['bins']
labels = bins_label_dict['labels']
# append label to row with max value
first_df .loc[index, 'action'] = pd.cut([max_val], bins=bins, labels=labels)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | mmhverheijden |