'Realise accumulated DataFrame from a column of Boolean values
Be the following python pandas DataFrame:
ID | Holidays | visit_1 | visit_2 | visit_3 | other |
---|---|---|---|---|---|
0 | True | 1 | 2 | 0 | red |
0 | False | 3 | 2 | 0 | red |
0 | True | 4 | 4 | 1 | blue |
1 | False | 2 | 0 | 0 | red |
1 | True | 1 | 2 | 1 | green |
2 | False | 1 | 0 | 0 | red |
Currently I calculate a new DataFrame with the accumulated visit values as follows.
# Calculate the columns of the total visit count
visit_df = df.groupby('ID')[['visit_1', 'visit_2', 'visit_3']].sum()
I would like to create a new one taking into account only the rows whose Holiday value is True. How could I do this?
Solution 1:[1]
Simple subset the rows first:
df[df['Holidays']].groupby('ID')[['visit_1', 'visit_2', 'visit_3']].sum()
output:
visit_1 visit_2 visit_3
ID
0 5 6 1
1 1 2 1
Alternative if you want to also get the groups without any match:
df2 = df.set_index('ID')
(df2.where(df2['Holidays'])
.groupby('ID')[['visit_1', 'visit_2', 'visit_3']].sum()
)
output:
visit_1 visit_2 visit_3
ID
0 5.0 6.0 1.0
1 1.0 2.0 1.0
2 0.0 0.0 0.0
variant
df2 = df.set_index('ID')
(df2.where(df2['Holidays'])
.groupby('ID')[['visit_1', 'visit_2', 'visit_3']].sum()
.convert_dtypes()
.add_suffix('_Holidays')
)
output:
visit_1_Holidays visit_2_Holidays visit_3_Holidays
ID
0 5 6 1
1 1 2 1
2 0 0 0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |