'How do you identify column numbers based on partial string match in a way that can be used with lapply?
I have a df with about 1200 columns and the colnames are related to the response categories (ie. Likert scale identifiers).
It is similar in format to the df created below: (edit I have included a snippet of the actual dataframe now)
structure(list(`mission support` = c("Strongly Support", "Strongly Support",
"Support", "Support", "Strongly Support", "Support", "Support",
"Strongly Support", "Support", "Strongly Support", "Support",
"Strongly Support", "Support", "Strongly Support", "Strongly Support",
"Strongly Support", "Strongly Support", "Neither oppose nor support",
"Strongly Support", "Strongly Support", "Support", "Strongly Support",
"Support", "Strongly Support", "Support"), `mission opposition` = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_),
`ed support` = c("Strongly Support", "Strongly Support",
"Support", "Support", "Strongly Support", "Support", "Support",
"Strongly Support", "Support", "Support", "Strongly Support",
"Strongly Support", "Support", "Support", "Support", "Strongly Support",
"Strongly Support", "Neither oppose nor support", "Strongly Support",
"Strongly Support", "Support", "Strongly Support", "Support",
"Support", "Support"), `ed mission opposition` = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_
), `non-agency engagement` = c("Yes", "No", "No", "No", "Maybe",
"No", "No", "No", "No", "Maybe", "No", "No", "Yes", "No",
"No", "No", "No", "No", "Yes", "Yes", "Yes", "No", "No",
"No", "No"), `agency knowledge` = c("Yes", "Yes", "Yes",
"Yes", "Yes", "No", "No", "Yes", "Yes", "Yes", "Yes", "Yes",
"Yes", "No", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes",
"No", "Yes", "Yes", "Yes"), `agency engagement` = c("Yes",
"Yes", "Yes", "Yes", "Yes", NA, NA, "Yes", "Yes", "Yes",
"Maybe", "No", "Yes", NA, "No", "Yes", "Yes", "Yes", "No",
"Yes", "Yes", NA, "Yes", "Yes", "Yes")), row.names = c(NA,
-25L), class = c("tbl_df", "tbl", "data.frame"))
What I am trying to do is identify the column numbers based in a partial string match in a way that I can use lapply function to assign factor levels etc. without having to individually identify each column number.
I have tried grep (see below), which returns the column numbers,
cols<-c(grep("support", colnames(df)))
but not in a way that can be used successfully with lapply function.
df[cols]<-lapply(df[cols], factor,
levels = c("Strongly Oppose", "Oppose", "Neither oppose nor support", "Support", "Strongly Support"),
labels = c("Strongly Oppose", "Oppose", "Neither oppose nor support", "Support", "Strongly Support"))
It works for the first column identified, but not the additional columns.
Any idea on how to get this to work?
Solution 1:[1]
I'm not sure what your expected output is, but it seems that I can't reproduce the problem described above. Your approach seems to work here, even with non-syntactical names:
cols <- c(grep("support", colnames(df)))
df[cols] <- lapply(df[cols], factor,
levels = c("Strongly Oppose", "Oppose", "Neither oppose nor support", "Support", "Strongly Support"),
labels = c("Strongly Oppose", "Oppose", "Neither oppose nor support", "Support", "Strongly Support"))
str(df)
#> Classes 'tbl_df', 'tbl' and 'data.frame': 25 obs. of 7 variables:
#> $ mission support : Factor w/ 5 levels "Strongly Oppose",..: 5 5 4 4 5 4 4 5 4 5 ...
#> $ mission opposition : chr NA NA NA NA ...
#> $ ed support : Factor w/ 5 levels "Strongly Oppose",..: 5 5 4 4 5 4 4 5 4 4 ...
#> $ ed mission opposition: chr NA NA NA NA ...
#> $ non-agency engagement: chr "Yes" "No" "No" "No" ...
#> $ agency knowledge : chr "Yes" "Yes" "Yes" "Yes" ...
#> $ agency engagement : chr "Yes" "Yes" "Yes" "Yes" ...
Created on 2021-12-28 by the reprex package (v0.3.0)
Another option is to use dplyr::across
instead of lapply
:
library(dplyr)
df %>%
mutate(across(contains("support"),
factor,
levels = c("Strongly Oppose", "Oppose", "Neither oppose nor support", "Support", "Strongly Support"),
labels = c("Strongly Oppose", "Oppose", "Neither oppose nor support", "Support", "Strongly Support"))
) %>%
glimpse
#> Rows: 25
#> Columns: 7
#> $ `mission support` <fct> Strongly Support, Strongly Support, Support, S…
#> $ `mission opposition` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ `ed support` <fct> Strongly Support, Strongly Support, Support, S…
#> $ `ed mission opposition` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ `non-agency engagement` <chr> "Yes", "No", "No", "No", "Maybe", "No", "No", …
#> $ `agency knowledge` <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "No", "No",…
#> $ `agency engagement` <chr> "Yes", "Yes", "Yes", "Yes", "Yes", NA, NA, "Ye…
Created on 2021-12-28 by the reprex package (v0.3.0)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | TimTeaFan |