'Using Dplyr to calculate percent by group for every column without specifying the name?
This is similar to this. However what I'm interested is to calculate the percentage for every column. So for example when I do the below I can calculate column S1 by explicity listing it, however I want a way to do it for all columns without specifying it.
input <- 'Gene Exon S1 S2 S3
G1 E1 56 52 95
G1 E2 25 52 5
G1 E3 32 66 22
G2 E1 55 11 33
G2 E2 46 12 44'
df = read.table ( text=input, header=T)
df$Exon = NULL
df %>% group_by(Gene) %>% summarise ( per = S1 / sum (S1) )
Above will summarize the percent for S1 however when I tried using the a period it causes and error.
df %>% group_by(Gene) %>% summarise ( per = . / sum (.) )
thanks in advance.
Solution 1:[1]
You can use across
for this:
library(dplyr)
df %>%
group_by(Gene) %>%
summarize(across(matches("^S[0-9]+"), ~ . / sum(.)), .groups = "drop")
# # A tibble: 5 x 4
# Gene S1 S2 S3
# <chr> <dbl> <dbl> <dbl>
# 1 G1 0.496 0.306 0.779
# 2 G1 0.221 0.306 0.0410
# 3 G1 0.283 0.388 0.180
# 4 G2 0.545 0.478 0.429
# 5 G2 0.455 0.522 0.571
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |