'Using Dplyr to calculate percent by group for every column without specifying the name?

This is similar to this. However what I'm interested is to calculate the percentage for every column. So for example when I do the below I can calculate column S1 by explicity listing it, however I want a way to do it for all columns without specifying it.

input <- 'Gene  Exon    S1  S2  S3
G1  E1  56  52  95
G1  E2  25  52  5
G1  E3  32  66  22
G2  E1  55  11  33
G2  E2  46  12  44'

df = read.table ( text=input, header=T)
df$Exon = NULL 
df %>% group_by(Gene) %>% summarise ( per = S1 / sum (S1) ) 

Above will summarize the percent for S1 however when I tried using the a period it causes and error.

df %>% group_by(Gene) %>% summarise ( per = . / sum (.) ) 

thanks in advance.



Solution 1:[1]

You can use across for this:

library(dplyr)
df %>%
  group_by(Gene) %>%
  summarize(across(matches("^S[0-9]+"), ~ . / sum(.)), .groups = "drop") 
# # A tibble: 5 x 4
#   Gene     S1    S2     S3
#   <chr> <dbl> <dbl>  <dbl>
# 1 G1    0.496 0.306 0.779 
# 2 G1    0.221 0.306 0.0410
# 3 G1    0.283 0.388 0.180 
# 4 G2    0.545 0.478 0.429 
# 5 G2    0.455 0.522 0.571 

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1