'In dplyr using str_detect and case_when in R
This is my df:
mydf <- structure(list(Action = c("Passes accurate", "Passes accurate", 
"Passes accurate", "Passes accurate", "Lost balls", "Lost balls (in opp. half)", 
"Passes (inaccurate)", "Interceptions (in opp. half)", "Interceptions", 
"Positional attacks")), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))
I have this vector: passes <- c('Passes','passes','Assists','Crosses')
I am trying to do this: mydf %>% mutate(newcol = case_when(str_detect(Action, passes) ~ 'passes'))
But I only have the frst row filled with passes. I should have for example the first 4 rows filled with passes. Also the 7th row. How can I achieve this with case_when function?
Solution 1:[1]
I used str_sub() for this.
mydf %>% mutate(newcol = case_when(str_sub(Action,1,6) == 'Passes' ~ "passes"))
print(mydf)
   Action                       newcol
   <chr>                        <chr> 
 1 Passes accurate              passes
 2 Passes accurate              passes
 3 Passes accurate              passes
 4 Passes accurate              passes
 5 Lost balls                   NA    
 6 Lost balls (in opp. half)    NA    
 7 Passes (inaccurate)          passes
 8 Interceptions (in opp. half) NA    
 9 Interceptions                NA    
10 Positional attacks           NA 
Solution 2:[2]
You can easily do:
library(tidyverse)
mydf %>%
  mutate(newcol = if_else(str_detect(Action, paste0(passes, collapse = '|')), 'passes', NA_character_))
# A tibble: 10 x 2
   Action                       newcol
   <chr>                        <chr> 
 1 Passes accurate              passes
 2 Passes accurate              passes
 3 Passes accurate              passes
 4 Passes accurate              passes
 5 Lost balls                   <NA>  
 6 Lost balls (in opp. half)    <NA>  
 7 Passes (inaccurate)          passes
 8 Interceptions (in opp. half) <NA>  
 9 Interceptions                <NA>  
10 Positional attacks           <NA>  
Solution 3:[3]
An option is also to use fuzzyjoin
library(fuzzyjoin)
library(dplyr)
regex_left_join(mydf, tibble(passes, newcol = "passes"),
    by = c("Action" = "passes")) %>%
   select(-passes)
-output
# A tibble: 10 × 2
   Action                       newcol
   <chr>                        <chr> 
 1 Passes accurate              passes
 2 Passes accurate              passes
 3 Passes accurate              passes
 4 Passes accurate              passes
 5 Lost balls                   <NA>  
 6 Lost balls (in opp. half)    <NA>  
 7 Passes (inaccurate)          passes
 8 Interceptions (in opp. half) <NA>  
 9 Interceptions                <NA>  
10 Positional attacks           <NA>  
Solution 4:[4]
You'll need to use paste(collapse = "|") so that you can break the vector into one single string separated by "|", and then grepl() can look for element 1 or element 2 or element 3 and so on against "Action".
library(dplyr)
passes <- c('Passes','passes','Assists','Crosses')
mydf %>% mutate(newcol = case_when(grepl(paste(passes, collapse = "|"), Action) ~ "passes"))
# A tibble: 10 x 2
   Action                       newcol
   <chr>                        <chr> 
 1 Passes accurate              passes
 2 Passes accurate              passes
 3 Passes accurate              passes
 4 Passes accurate              passes
 5 Lost balls                   NA    
 6 Lost balls (in opp. half)    NA    
 7 Passes (inaccurate)          passes
 8 Interceptions (in opp. half) NA    
 9 Interceptions                NA    
10 Positional attacks           NA    
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source | 
|---|---|
| Solution 1 | JAdel | 
| Solution 2 | deschen | 
| Solution 3 | akrun | 
| Solution 4 | 
