'Summing across in a dataframe with condition coming from another column

this is not a very good title for the question. I want to sum across certain columns in a data frame for each group, excluding one column for each of my groups. A simple example would be as follows:

df <- tibble(group_name = c("A", "B","C"), mean_A = c(1,2,3), mean_B = c(2,3,4), mean_C=c(3,4,5))
df %>% group_by(group_name) %>% mutate(m1 = sum(across(contains("mean"))))

This creates column m1, which is the sum across mean_a, mean_b, mean_c for each group. What I want to do is exclude mean_a for group a, mean_b for b and mean_c for c. The following does not work though (not surprisingly).

df %>% group_by(group_name) %>% mutate(m1 = sum(across(c(contains("mean") & !contains(group_name)))))

Do you have an idea how I could do this? My original data contains many more groups, so would be hard to do by hand.

Edit: I have tried the following way which solves it in a rudimentary fashion, but something (?grepl maybe) seems to not work great here and I get the wrong result.

df %>% pivot_longer(!group_name) %>% mutate(value2 = case_when(grepl(group_name, name) ~ 0, TRUE ~ value)) %>% group_by(group_name) %>% summarise(m1 = sum(value2))

Edit2: Found out what's wrong with the above, and below works, but still a lot of warnings so I recommend people to follow TarJae's response below

df %>% pivot_longer(!group_name) %>% group_by(group_name) %>% mutate(value2 = case_when(grepl(group_name, name) ~ 0, TRUE ~ value)) %>% group_by(group_name) %>% summarise(m1 = sum(value2))

r dplyr

Solution 1:^[1]

Here is another option where you can just use group_name directly with the tidyselect helpers:

df %>% 
  rowwise() %>% 
  mutate(m1 = rowSums(select(across(starts_with("mean")), -ends_with(group_name)))) %>% 
  ungroup()

Output

  group_name mean_A mean_B mean_C    m1
  <chr>       <dbl>  <dbl>  <dbl> <dbl>
1 A               1      2      3     5
2 B               2      3      4     6
3 C               3      4      5     7

How it works

The row-wise output of across is a 1-row tibble containing only the variables that start with "mean".
select unselects the subset of the variables from output by across that end with the value from group_name.
At this point you are left with a 1 x 2 tibble, which is then summed using rowSums.

Solution 2:^[2]

Here is one way how we could do it:

We create a helper column to match column names
We set value of mean column to zeor if column names matches helper name.
Then we use transmute with select to calculate rowSums
Finally we cbind column m1 to df:

library(dplyr)

df %>% 
  mutate(helper = paste0("mean_", group_name)) %>% 
  mutate(across(starts_with("mean"), ~ifelse(cur_column()==helper, 0, .))) %>%
  transmute(m1 = select(., contains("mean")) %>% 
           rowSums()) %>% 
  cbind(df)

  m1 group_name mean_a mean_b mean_c
1  5          a      1      2      3
2  6          b      2      3      4
3  7          c      3      4      5

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1
Solution 2	TarJae

'Summing across in a dataframe with condition coming from another column

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]