'How can I use a vector variable to avoid repeating references to non-adjacent columns in R?

Question

In R, can I used a vector that holds the names of data frame columns to avoid repeated code?

vec_columns <- c("col1", "col2", "col8", "col54")

Background

I started off looking to solve the problem that is answered in this question, which works on its own terms: Coalesce columns and create another column to specify source

But in my specific use case, I have many columns and the ones that I want to coalesce() are not adjacent to each other, so the <tidy-select> used in that solution doesn't work for me.

Modified Example from Original Question

In the original question, the OP was using contiguous columns that began in column #1, but I have leading columns that are not part of the coalesce(), plus the columns I want to coalesce() are separated from each other.

df_2 <-
  data.frame(
    Name = c("A", "B", "C", "D", "E"),  #Adding a name column not in original posted question
    group_1 = c(NA, NA, NA, NA, 2),
    group_2 = c(NA, 4, NA, NA, 1),
    group_3 = c(NA, NA, 5, NA, NA),
    group_4 = c(1, NA, NA, 2, NA),
    group_5 = c(NA, 3, NA, NA, NA)
  )

> df_2
  Name group_1 group_2 group_3 group_4 group_5
1    A      NA      NA      NA       1      NA
2    B      NA       4      NA      NA       3
3    C      NA      NA       5      NA      NA
4    D      NA      NA      NA       2      NA
5    E       2       1      NA      NA      NA

This solution below I created and does exactly what I want for output:

df_2 %>% 
 mutate(one_col = coalesce(group_2, group_3, group_5)) %>% 
 rowwise() %>%
 mutate(group_col = c("group_2", "group_3", "group_5")[!is.na(c_across(c(group_2, group_3, group_5)))][1])

# A tibble: 5 x 8
# Rowwise: 
  Name  group_1 group_2 group_3 group_4 group_5 one_col group_col
  <chr>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <chr>    
1 A          NA      NA      NA       1      NA      NA NA       
2 B          NA       4      NA      NA       3       4 group_2  
3 C          NA      NA       5      NA      NA       5 group_3  
4 D          NA      NA      NA       2      NA      NA NA       
5 E           2       1      NA      NA      NA       1 group_2  

The Problem

But as you can see, I have to repeat those column names 3 times. Future proofing myself, I envision where I might want or 10 columns in the coalesce(). I want to just set a variable holding a vector of column names once, but doing the obvious myvec <- c("group_2", "group_3", "group_5") and inserting it doesn't work.

EDIT

Reading the comments, I got to an answer that meets the need, but I'm reluctant to answer it myself because credit goes to the commenter(s).

This achieves what I wanted:

myvec <- c("group_2", "group_3","group_5")

df_2 %>% 
  mutate(one_col = coalesce(!!!select(.,myvec))) %>% rowwise() %>% 
  mutate(group_col = myvec[!is.na(c_across(myvec))][1])

# A tibble: 5 x 8
# Rowwise: 
  Name  group_1 group_2 group_3 group_4 group_5 one_col group_col
  <chr>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <chr>    
1 A          NA      NA      NA       1      NA      NA NA       
2 B          NA       4      NA      NA       3       4 group_2  
3 C          NA      NA       5      NA      NA       5 group_3  
4 D          NA      NA      NA       2      NA      NA NA       
5 E           2       1      NA      NA      NA       1 group_2  

The main goal here is really to get the group_col column which records which column was used in the coalesce(), but with the flexibility to handle non-adjacent columns, and also to change the order of the columns in the coalesce(). For example, below I reverse the order of the myvec so that row 2, "B" selects 3 and group_5, instead of 4 and group_2 in the above example.

> myvec2 <- c("group_5", "group_3","group_2")
> df_2 %>% 
+   mutate(one_col = coalesce(!!!select(.,myvec2))) %>% rowwise() %>% 
+   mutate(group_col = myvec2[!is.na(c_across(myvec2))][1])
Note: Using an external vector in selections is ambiguous.
i Use `all_of(myvec2)` instead of `myvec2` to silence this message.
i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
This message is displayed once per session.
# A tibble: 5 x 8
# Rowwise: 
  Name  group_1 group_2 group_3 group_4 group_5 one_col group_col
  <chr>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <chr>    
1 A          NA      NA      NA       1      NA      NA NA       
2 B          NA       4      NA      NA       3       3 group_5  
3 C          NA      NA       5      NA      NA       5 group_3  
4 D          NA      NA      NA       2      NA      NA NA       
5 E           2       1      NA      NA      NA       1 group_2  

I am mildly concerned about the message I received about external vector in selections is ambiguous, so perhaps I'll need to use that all_of() function, but this did work.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source