'R: identifying cases where the row contains more than one character from a list
I'm working on a large dataset where I need to create a new column and assign appropriate labels. So for instance, I have rows containing a mixture of fruits and vegetables, and I want a new column where it identifies the row as fruit, vegetable, or mixed. An ideal outcome is:
# name type
# 1 apple fruit
# 2 bananas fruit
# 3 kale vegetable
# 4 apple, kale mixed
While I'm successful at identifying types for rows containing single "items", I'm having trouble coding for "mixed" type. I was thinking of using AND (for example, mixed = fruit AND vegetable).. but this didn't work. Is there an efficient/correct way to coding for mixed instances? I appreciate your help, thank you!!
So right now what I have is:
df <- data.frame(name = c('apple','bananas','kale','apple, kale'),
stringsAsFactors = FALSE)
df$type <- factor(df$name) # First step: copy vector and make it factor
#Change levels:
levels(df$type) <- list(
fruit = c("apple", "bananas"),
vegetable = c("kale")
)
Solution 1:[1]
Here is a solution using the purrr and dplyr packages:
library(dplyr)
library(purrr)
l <- list(
fruit = c("apple", "bananas"),
vegetable = c("kale")
)
v <- unlist(l)
nm <- gsub("\\d", "", setNames(names(v), v))
df %>%
mutate(type = map_chr(strsplit(name, ", "), ~ if (n_distinct(nm[.x]) > 1) "mixed" else nm[.x]))
Output
name type
1 apple fruit
2 bananas fruit
3 kale vegetable
4 apple, kale mixed
How it works
- Create a named vector to help translate
nameintotype. For example,nm["apple"]returnsfruit. - Then we split
nameinto a list:
strsplit(df$name, ", ")
[[1]]
[1] "apple"
[[2]]
[1] "bananas"
[[3]]
[1] "kale"
[[4]]
[1] "apple" "kale"
- Lastly, we map a
purrrstyle function over this list. If the number of unique types returned is greater than one the value ismixed, otherwise we return the value from the named vectornm.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
