'In dplyr using str_detect and case_when in R

This is my df:

mydf <- structure(list(Action = c("Passes accurate", "Passes accurate", 
"Passes accurate", "Passes accurate", "Lost balls", "Lost balls (in opp. half)", 
"Passes (inaccurate)", "Interceptions (in opp. half)", "Interceptions", 
"Positional attacks")), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))

I have this vector: passes <- c('Passes','passes','Assists','Crosses')

I am trying to do this: mydf %>% mutate(newcol = case_when(str_detect(Action, passes) ~ 'passes'))

But I only have the frst row filled with passes. I should have for example the first 4 rows filled with passes. Also the 7th row. How can I achieve this with case_when function?



Solution 1:[1]

I used str_sub() for this.

mydf %>% mutate(newcol = case_when(str_sub(Action,1,6) == 'Passes' ~ "passes"))

print(mydf)
   Action                       newcol
   <chr>                        <chr> 
 1 Passes accurate              passes
 2 Passes accurate              passes
 3 Passes accurate              passes
 4 Passes accurate              passes
 5 Lost balls                   NA    
 6 Lost balls (in opp. half)    NA    
 7 Passes (inaccurate)          passes
 8 Interceptions (in opp. half) NA    
 9 Interceptions                NA    
10 Positional attacks           NA 

Solution 2:[2]

You can easily do:

library(tidyverse)
mydf %>%
  mutate(newcol = if_else(str_detect(Action, paste0(passes, collapse = '|')), 'passes', NA_character_))

# A tibble: 10 x 2
   Action                       newcol
   <chr>                        <chr> 
 1 Passes accurate              passes
 2 Passes accurate              passes
 3 Passes accurate              passes
 4 Passes accurate              passes
 5 Lost balls                   <NA>  
 6 Lost balls (in opp. half)    <NA>  
 7 Passes (inaccurate)          passes
 8 Interceptions (in opp. half) <NA>  
 9 Interceptions                <NA>  
10 Positional attacks           <NA>  

Solution 3:[3]

An option is also to use fuzzyjoin

library(fuzzyjoin)
library(dplyr)
regex_left_join(mydf, tibble(passes, newcol = "passes"),
    by = c("Action" = "passes")) %>%
   select(-passes)

-output

# A tibble: 10 × 2
   Action                       newcol
   <chr>                        <chr> 
 1 Passes accurate              passes
 2 Passes accurate              passes
 3 Passes accurate              passes
 4 Passes accurate              passes
 5 Lost balls                   <NA>  
 6 Lost balls (in opp. half)    <NA>  
 7 Passes (inaccurate)          passes
 8 Interceptions (in opp. half) <NA>  
 9 Interceptions                <NA>  
10 Positional attacks           <NA>  

Solution 4:[4]

You'll need to use paste(collapse = "|") so that you can break the vector into one single string separated by "|", and then grepl() can look for element 1 or element 2 or element 3 and so on against "Action".

library(dplyr)

passes <- c('Passes','passes','Assists','Crosses')

mydf %>% mutate(newcol = case_when(grepl(paste(passes, collapse = "|"), Action) ~ "passes"))

# A tibble: 10 x 2
   Action                       newcol
   <chr>                        <chr> 
 1 Passes accurate              passes
 2 Passes accurate              passes
 3 Passes accurate              passes
 4 Passes accurate              passes
 5 Lost balls                   NA    
 6 Lost balls (in opp. half)    NA    
 7 Passes (inaccurate)          passes
 8 Interceptions (in opp. half) NA    
 9 Interceptions                NA    
10 Positional attacks           NA    

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 JAdel
Solution 2 deschen
Solution 3 akrun
Solution 4