'In dplyr using str_detect and case_when in R
This is my df:
mydf <- structure(list(Action = c("Passes accurate", "Passes accurate",
"Passes accurate", "Passes accurate", "Lost balls", "Lost balls (in opp. half)",
"Passes (inaccurate)", "Interceptions (in opp. half)", "Interceptions",
"Positional attacks")), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
I have this vector: passes <- c('Passes','passes','Assists','Crosses')
I am trying to do this: mydf %>% mutate(newcol = case_when(str_detect(Action, passes) ~ 'passes'))
But I only have the frst row filled with passes
. I should have for example the first 4 rows filled with passes
. Also the 7th row. How can I achieve this with case_when
function?
Solution 1:[1]
I used str_sub()
for this.
mydf %>% mutate(newcol = case_when(str_sub(Action,1,6) == 'Passes' ~ "passes"))
print(mydf)
Action newcol
<chr> <chr>
1 Passes accurate passes
2 Passes accurate passes
3 Passes accurate passes
4 Passes accurate passes
5 Lost balls NA
6 Lost balls (in opp. half) NA
7 Passes (inaccurate) passes
8 Interceptions (in opp. half) NA
9 Interceptions NA
10 Positional attacks NA
Solution 2:[2]
You can easily do:
library(tidyverse)
mydf %>%
mutate(newcol = if_else(str_detect(Action, paste0(passes, collapse = '|')), 'passes', NA_character_))
# A tibble: 10 x 2
Action newcol
<chr> <chr>
1 Passes accurate passes
2 Passes accurate passes
3 Passes accurate passes
4 Passes accurate passes
5 Lost balls <NA>
6 Lost balls (in opp. half) <NA>
7 Passes (inaccurate) passes
8 Interceptions (in opp. half) <NA>
9 Interceptions <NA>
10 Positional attacks <NA>
Solution 3:[3]
An option is also to use fuzzyjoin
library(fuzzyjoin)
library(dplyr)
regex_left_join(mydf, tibble(passes, newcol = "passes"),
by = c("Action" = "passes")) %>%
select(-passes)
-output
# A tibble: 10 × 2
Action newcol
<chr> <chr>
1 Passes accurate passes
2 Passes accurate passes
3 Passes accurate passes
4 Passes accurate passes
5 Lost balls <NA>
6 Lost balls (in opp. half) <NA>
7 Passes (inaccurate) passes
8 Interceptions (in opp. half) <NA>
9 Interceptions <NA>
10 Positional attacks <NA>
Solution 4:[4]
You'll need to use paste(collapse = "|")
so that you can break the vector into one single string separated by "|", and then grepl()
can look for element 1 or element 2 or element 3 and so on against "Action".
library(dplyr)
passes <- c('Passes','passes','Assists','Crosses')
mydf %>% mutate(newcol = case_when(grepl(paste(passes, collapse = "|"), Action) ~ "passes"))
# A tibble: 10 x 2
Action newcol
<chr> <chr>
1 Passes accurate passes
2 Passes accurate passes
3 Passes accurate passes
4 Passes accurate passes
5 Lost balls NA
6 Lost balls (in opp. half) NA
7 Passes (inaccurate) passes
8 Interceptions (in opp. half) NA
9 Interceptions NA
10 Positional attacks NA
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | JAdel |
Solution 2 | deschen |
Solution 3 | akrun |
Solution 4 |