'R: identifying cases where the row contains more than one character from a list
I'm working on a large dataset where I need to create a new column and assign appropriate labels. So for instance, I have rows containing a mixture of fruits and vegetables, and I want a new column where it identifies the row as fruit, vegetable, or mixed. An ideal outcome is:
# name type
# 1 apple fruit
# 2 bananas fruit
# 3 kale vegetable
# 4 apple, kale mixed
While I'm successful at identifying types for rows containing single "items", I'm having trouble coding for "mixed" type. I was thinking of using AND (for example, mixed = fruit AND vegetable).. but this didn't work. Is there an efficient/correct way to coding for mixed instances? I appreciate your help, thank you!!
So right now what I have is:
df <- data.frame(name = c('apple','bananas','kale','apple, kale'),
stringsAsFactors = FALSE)
df$type <- factor(df$name) # First step: copy vector and make it factor
#Change levels:
levels(df$type) <- list(
fruit = c("apple", "bananas"),
vegetable = c("kale")
)
Solution 1:[1]
Here is a solution using the purrr
and dplyr
packages:
library(dplyr)
library(purrr)
l <- list(
fruit = c("apple", "bananas"),
vegetable = c("kale")
)
v <- unlist(l)
nm <- gsub("\\d", "", setNames(names(v), v))
df %>%
mutate(type = map_chr(strsplit(name, ", "), ~ if (n_distinct(nm[.x]) > 1) "mixed" else nm[.x]))
Output
name type
1 apple fruit
2 bananas fruit
3 kale vegetable
4 apple, kale mixed
How it works
- Create a named vector to help translate
name
intotype
. For example,nm["apple"]
returnsfruit
. - Then we split
name
into a list:
strsplit(df$name, ", ")
[[1]]
[1] "apple"
[[2]]
[1] "bananas"
[[3]]
[1] "kale"
[[4]]
[1] "apple" "kale"
- Lastly, we map a
purrr
style function over this list. If the number of unique types returned is greater than one the value ismixed
, otherwise we return the value from the named vectornm
.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |