'Realise accumulated DataFrame from a column of Boolean values

Be the following python pandas DataFrame:

ID Holidays visit_1 visit_2 visit_3 other
0 True 1 2 0 red
0 False 3 2 0 red
0 True 4 4 1 blue
1 False 2 0 0 red
1 True 1 2 1 green
2 False 1 0 0 red

Currently I calculate a new DataFrame with the accumulated visit values as follows.

# Calculate the columns of the total visit count
visit_df = df.groupby('ID')[['visit_1', 'visit_2', 'visit_3']].sum()

I would like to create a new one taking into account only the rows whose Holiday value is True. How could I do this?



Solution 1:[1]

Simple subset the rows first:

df[df['Holidays']].groupby('ID')[['visit_1', 'visit_2', 'visit_3']].sum()

output:

    visit_1  visit_2  visit_3
ID                           
0         5        6        1
1         1        2        1

Alternative if you want to also get the groups without any match:

df2 = df.set_index('ID')

(df2.where(df2['Holidays'])
   .groupby('ID')[['visit_1', 'visit_2', 'visit_3']].sum()
)

output:

    visit_1  visit_2  visit_3
ID                           
0       5.0      6.0      1.0
1       1.0      2.0      1.0
2       0.0      0.0      0.0
variant
df2 = df.set_index('ID')

(df2.where(df2['Holidays'])
   .groupby('ID')[['visit_1', 'visit_2', 'visit_3']].sum()
   .convert_dtypes()
   .add_suffix('_Holidays')
)

output:

    visit_1_Holidays  visit_2_Holidays  visit_3_Holidays
ID                                                      
0                  5                 6                 1
1                  1                 2                 1
2                  0                 0                 0

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1