'Pandas find consecutive ones, column wise
I am having an output data frame like the one below and I wanted to format the output so that I can use it for the further pipeline.
Few pointers about the data frame:
1)This data frame is the weekly workload data for employees.
2)load 0, load 30, load 100, etc, represents half an hour slot. Each load is a half an hour shift.
2) Whenever "1" starts it represents a shift start and whenever "BREAK" appears it represents a break slot/shift.
For example: In row 1, for the employee 1234, his shift starts at 12:00 and ends at 2:00, and in between, he is having a break from 1:00 to 1:30
employee date store load0 load30 load100 load130 load200 load230 load300
1234 2021-12-1 450 1 1 BREAK 1 1 0 0
1234 2021-12-2 450 0 1 1 BREAK 1 1 0
5678 2021-12-1 650 0 0 0 0 1 1 0
5678 2021-12-2 650 0 0 1 1 BREAK 1 0
For the above example the output should be something like:
Start End Segment type
date+12:00:00 date+1:00:00 Regular_segment
date+1:00:00 date+1:30:00 Break segment
date+1:30:00 date+2:30:00 Regular segment
Ps. there are around 350 employees and for every employee, there will be a schedule like this for less than 7 days in a week
I want the output like BELOW:
employee store Start End SegmentType
0 1234 450 2021-12-1T12:00:00Z 2021-12-1T12:30:00Z REGULAR_SEGMENT
1 1234 450 2021-12-1T1:00:00Z 2021-12-1T1:30:00Z BREAK_SEGMENT
2 1234 450 2021-12-1T1:30:00Z 2021-12-1T2:00:00Z REGULAR_SEGMENT
3 1234 450 2021-12-2T12:30:00Z 2021-12-2T1:00:00Z REGULAR_SEGMENT
4 1234 450 2021-12-2T1:30:00Z 2021-12-2T2:20:00Z BREAK_SEGMENT
5 1234 450 2021-12-2T2:00:00Z 2021-12-2T2:30:00Z REGULAR_SEGMENT
6 5678 650 2021-12-1T2:00:00Z 2021-12-1T2:30:00Z REGULAR_SEGMENT
7 5678 650 2021-12-2T1:00:00Z 2021-12-1T2:30:00Z REGULAR_SEGMENT
8 5678 650 2021-12-2T2:00:00Z 2021-12-2T2:00:00Z BREAK_SEGMENT
9 5678 650 2021-12-2T2:30:00Z 2021-12-2T2:30:00Z REGULAR_SEGMENT
Solution 1:[1]
I hope this will work!
from datetime import timedelta
def segment_type(df: pd.DataFrame) -> pd.DataFrame:
df_melt = df.melt(id_vars=['employee', 'date', 'store'], var_name='time')
df_melt['time'] = df_melt['time'].str.replace('load', '').astype(int)
df_melt['hour'] = [int(str(x)[0]) + 12 if x != 30 else 12 for x in df_melt['time']]
df_melt['hour'] = df_melt['hour'].astype(str)
df_melt['minute'] = [str(str(x)[1:]) if x not in [0, 30] else
'00' if x == 0 else
'30' for x in df_melt['time']]
df_melt['clock'] = df_melt['hour'] + ':' + df_melt['minute']
df_melt['date'] = df_melt['date'] + '-' + df_melt['clock'].astype(str)
df_melt['date'] = df_melt['date'].astype('datetime64[ns]')
df_melt['start'] = df_melt['date']
df_melt['end'] = df_melt['start'] + timedelta(minutes=30)
df_melt = df_melt[df_melt['value'].isin(['1', 'BREAK'])]
df_melt['SegmentType'] = ['REGULAR_SEGMENT' if x == '1' else
'BREAK_SEGMENT' for x in df_melt['value']]
df_melt = df_melt[['employee', 'date', 'start', 'end', 'SegmentType']]
df_melt.sort_values(['employee', 'date'], inplace=True, ignore_index=True)
return df_melt
new_frame = segment_type(df)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | William Rosenbaum |