'R: identifying cases where the row contains more than one character from a list

I'm working on a large dataset where I need to create a new column and assign appropriate labels. So for instance, I have rows containing a mixture of fruits and vegetables, and I want a new column where it identifies the row as fruit, vegetable, or mixed. An ideal outcome is:

#     name   type
# 1    apple fruit
# 2    bananas fruit
# 3    kale   vegetable
# 4    apple, kale   mixed

While I'm successful at identifying types for rows containing single "items", I'm having trouble coding for "mixed" type. I was thinking of using AND (for example, mixed = fruit AND vegetable).. but this didn't work. Is there an efficient/correct way to coding for mixed instances? I appreciate your help, thank you!!

So right now what I have is:

df <- data.frame(name = c('apple','bananas','kale','apple, kale'), 
             stringsAsFactors = FALSE)
df$type <- factor(df$name) # First step: copy vector and make it factor

#Change levels:
levels(df$type) <- list(
    fruit = c("apple", "bananas"),
    vegetable = c("kale")
)
r


Solution 1:[1]

Here is a solution using the purrr and dplyr packages:

library(dplyr)
library(purrr)

l <- list(
  fruit = c("apple", "bananas"),
  vegetable = c("kale")
)

v <- unlist(l)
nm <- gsub("\\d", "", setNames(names(v), v))

df %>%
  mutate(type = map_chr(strsplit(name, ", "), ~ if (n_distinct(nm[.x]) > 1) "mixed" else nm[.x]))

Output

         name      type
1       apple     fruit
2     bananas     fruit
3        kale vegetable
4 apple, kale     mixed

How it works

  1. Create a named vector to help translate name into type. For example, nm["apple"] returns fruit.
  2. Then we split name into a list:
strsplit(df$name, ", ")
[[1]]
[1] "apple"

[[2]]
[1] "bananas"

[[3]]
[1] "kale"

[[4]]
[1] "apple" "kale" 
  1. Lastly, we map a purrr style function over this list. If the number of unique types returned is greater than one the value is mixed, otherwise we return the value from the named vector nm.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1