'Printing values in new columns based on a condition from another column
I have a following dataframe:
Time | Tab | User | Description |
---|---|---|---|
27.10.2021 15:58:00 | Tab Alpha | [email protected] | Tab Alpha of type PARTSTUDIO opened by User A |
27.10.2021 15:59:00 | Tab Alpha | [email protected] | Start edit of part studio feature |
27.10.2021 15:59:00 | Tab Alpha | [email protected] | Cancel Operation |
27.10.2021 15:59:00 | Tab Alpha | [email protected] | Tab Alpha of type PARTSTUDIO opened by User B |
27.10.2021 15:59:00 | Tab Alpha | [email protected] | Start edit of part studio feature |
27.10.2021 16:03:00 | Tab Alpha | [email protected] | Cancel Operation |
27.10.2021 16:03:00 | Tab Alpha | [email protected] | Add assembly feature |
27.10.2021 16:03:00 | Tab Alpha | [email protected] | Tab Alpha of type PARTSTUDIO closed by User A |
27.10.2021 16:03:00 | Tab Beta | [email protected] | Tab Beta of type PARTSTUDIO opened by User A |
27.10.2021 16:15:00 | Tab Beta | [email protected] | Start edit of part studio feature |
27.10.2021 16:15:00 | Tab Alpha | [email protected] | Start edit of part studio feature |
27.10.2021 16:15:00 | Tab Alpha | [email protected] | Tab Alpha of type PARTSTUDIO closed by User B |
27.10.2021 16:17:00 | Tab Beta | [email protected] | Add assembly feature |
27.10.2021 16:17:00 | Tab Beta | [email protected] | Tab Beta of type ASSEMBLY opened by User C |
27.10.2021 16:17:00 | Tab Beta | [email protected] | Add assembly feature |
27.10.2021 16:17:00 | Tab Delta | [email protected] | Tab Delta of type PARTSTUDIO opened by User B |
27.10.2021 16:54:00 | Tab Delta | [email protected] | Add assembly feature |
27.10.2021 16:54:00 | Tab Beta | [email protected] | Tab Beta of type ASSEMBLY closed by User C |
27.10.2021 16:55:00 | Tab Beta | [email protected] | Tab Beta of type PARTSTUDIO closed by User A |
27.10.2021 16:55:00 | Tab Delta | [email protected] | Start edit of part studio feature |
27.10.2021 16:55:00 | Tab Delta | [email protected] | Tab Delta of type PARTSTUDIO closed by User B |
How to print values (Tab names) in a new column (UserA, B or C) based on a condition in the "Descripiton" column? The condition is to print said values (Tab names) between values "Tab 'Tab_name' of type ... opened by User X" and "Tab 'Tab_name' of type ... closed by User X" into the belonging column.
Expected output:
Time | Tab | User | Description | UserA | UserB | UserC |
---|---|---|---|---|---|---|
27.10.2021 15:58:00 | Tab Alpha | [email protected] | Tab Alpha of type PARTSTUDIO opened by User A | Tab Alpha | ||
27.10.2021 15:59:00 | Tab Alpha | [email protected] | Start edit of part studio feature | Tab Alpha | ||
27.10.2021 15:59:00 | Tab Alpha | [email protected] | Cancel Operation | Tab Alpha | ||
27.10.2021 15:59:00 | Tab Alpha | [email protected] | Tab Alpha of type PARTSTUDIO opened by User B | Tab Alpha | Tab Alpha | |
27.10.2021 15:59:00 | Tab Alpha | [email protected] | Start edit of part studio feature | Tab Alpha | Tab Alpha | |
27.10.2021 16:03:00 | Tab Alpha | [email protected] | Cancel Operation | Tab Alpha | Tab Alpha | |
27.10.2021 16:03:00 | Tab Alpha | [email protected] | Add assembly feature | Tab Alpha | Tab Alpha | |
27.10.2021 16:03:00 | Tab Alpha | [email protected] | Tab Alpha of type PARTSTUDIO closed by User A | Tab Alpha | Tab Alpha | |
27.10.2021 16:03:00 | Tab Beta | [email protected] | Tab Beta of type PARTSTUDIO opened by User A | Tab Beta | Tab Alpha | |
27.10.2021 16:15:00 | Tab Beta | [email protected] | Start edit of part studio feature | Tab Beta | Tab Alpha | |
27.10.2021 16:15:00 | Tab Alpha | [email protected] | Start edit of part studio feature | Tab Beta | Tab Alpha | |
27.10.2021 16:15:00 | Tab Alpha | [email protected] | Tab Alpha of type PARTSTUDIO closed by User B | Tab Beta | Tab Alpha | |
27.10.2021 16:17:00 | Tab Beta | [email protected] | Add assembly feature | Tab Beta | ||
27.10.2021 16:17:00 | Tab Beta | [email protected] | Tab Beta of type ASSEMBLY opened by User C | Tab Beta | Tab Beta | |
27.10.2021 16:17:00 | Tab Beta | [email protected] | Add assembly feature | Tab Beta | Tab Beta | |
27.10.2021 16:17:00 | Tab Delta | [email protected] | Tab Delta of type PARTSTUDIO opened by User B | Tab Beta | Tab Delta | Tab Beta |
27.10.2021 16:54:00 | Tab Delta | [email protected] | Add assembly feature | Tab Beta | Tab Delta | Tab Beta |
27.10.2021 16:54:00 | Tab Beta | [email protected] | Tab Beta of type ASSEMBLY closed by User C | Tab Beta | Tab Delta | Tab Beta |
27.10.2021 16:55:00 | Tab Beta | [email protected] | Tab Beta of type PARTSTUDIO closed by User A | Tab Beta | Tab Delta | |
27.10.2021 16:55:00 | Tab Delta | [email protected] | Start edit of part studio feature | Tab Delta | ||
27.10.2021 16:55:00 | Tab Delta | [email protected] | Tab Delta of type PARTSTUDIO closed by User B | Tab Delta |
Solution 1:[1]
Possible solution is the following:
# pip install pandas
import pandas as pd
import re
# set data and create dataframe
data = {'Time': {0: '27.10.2021 15:58:00 ', 1: '27.10.2021 15:59:00 ', 2: '27.10.2021 15:59:00 ', 3: '27.10.2021 15:59:00 ', 4: '27.10.2021 15:59:00 ', 5: '27.10.2021 16:03:00 ', 6: '27.10.2021 16:03:00 ', 7: '27.10.2021 16:03:00 ', 8: '27.10.2021 16:03:00 ', 9: '27.10.2021 16:15:00 ', 10: '27.10.2021 16:15:00 ', 11: '27.10.2021 16:15:00 ', 12: '27.10.2021 16:17:00 ', 13: '27.10.2021 16:17:00 ', 14: '27.10.2021 16:17:00 ', 15: '27.10.2021 16:17:00 ', 16: '27.10.2021 16:54:00 ', 17: '27.10.2021 16:54:00 ', 18: '27.10.2021 16:55:00 ', 19: '27.10.2021 16:55:00 ', 20: '27.10.2021 16:55:00 '}, 'Tab': {0: 'Tab Alpha ', 1: 'Tab Alpha ', 2: 'Tab Alpha ', 3: 'Tab Alpha ', 4: 'Tab Alpha ', 5: 'Tab Alpha ', 6: 'Tab Alpha ', 7: 'Tab Alpha ', 8: 'Tab Beta ', 9: 'Tab Beta ', 10: 'Tab Alpha ', 11: 'Tab Alpha ', 12: 'Tab Beta ', 13: 'Tab Beta ', 14: 'Tab Beta ', 15: 'Tab Delta ', 16: 'Tab Delta ', 17: 'Tab Beta ', 18: 'Tab Beta ', 19: 'Tab Delta ', 20: 'Tab Delta '}, 'User': {0: '[email protected] ', 1: '[email protected] ', 2: '[email protected] ', 3: '[email protected] ', 4: '[email protected] ', 5: '[email protected] ', 6: '[email protected] ', 7: '[email protected] ', 8: '[email protected] ', 9: '[email protected] ', 10: '[email protected] ', 11: '[email protected] ', 12: '[email protected] ', 13: '[email protected] ', 14: '[email protected] ', 15: '[email protected] ', 16: '[email protected] ', 17: '[email protected] ', 18: '[email protected] ', 19: '[email protected] ', 20: '[email protected] '}, 'Description': {0: 'Tab Alpha of type PARTSTUDIO opened by User A', 1: 'Start edit of part studio feature', 2: 'Cancel Operation', 3: 'Tab Alpha of type PARTSTUDIO opened by User B', 4: 'Start edit of part studio feature', 5: 'Cancel Operation', 6: 'Add assembly feature', 7: 'Tab Alpha of type PARTSTUDIO closed by User A', 8: 'Tab Beta of type PARTSTUDIO opened by User A', 9: 'Start edit of part studio feature', 10: 'Start edit of part studio feature', 11: 'Tab Alpha of type PARTSTUDIO closed by User B', 12: 'Add assembly feature', 13: 'Tab Beta of type ASSEMBLY opened by User C', 14: 'Add assembly feature', 15: 'Tab Delta of type PARTSTUDIO opened by User B', 16: 'Add assembly feature', 17: 'Tab Beta of type ASSEMBLY closed by User C', 18: 'Tab Beta of type PARTSTUDIO closed by User A', 19: 'Start edit of part studio feature', 20: 'Tab Delta of type PARTSTUDIO closed by User B'}}
df = pd.DataFrame(data)
# ----------------------------------
# collect unique sets of user names and tabs
users = [user.split("@")[0] for user in df['User'].tolist()]
tabs = [user.strip() for user in df['Tab'].tolist()]
tabs_users = sorted(list(set(zip(users, tabs))))
# create and fill new columns for each user
for item in tabs_users:
user = item[0]
tab = item[1]
user_str = item[0][:-1] + ' ' + item[0][-1]
open_string = re.compile('{} .+ opened by {}'.format(tab, user_str))
close_string = re.compile('{} .+ closed by {}'.format(tab, user_str))
# get cell index where tab was opened and closed by user
try:
idx_open = df.index[df['Description'].str.contains(open_string)][0]
idx_close = df.index[df['Description'].str.contains(close_string)][0]
df.loc[idx_open: idx_close, user] = tab
except IndexError:
pass
df.fillna("")
Returns
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | gremur |