'str_detect removing some but not all strings with specified ending
I'd like to remove any string that ends in either of 2 characters in a pipe. In this example it's ".o" or ".t". Some of them get removed, but not all of them, and I can't figure out why. I suspect something is wrong in the 'pattern = ' argument.
ex1 <- structure(list(variables = structure(1:18, .Label = c("canopy15",
"canopy16", "DistanceToRoad", "DistanceToEdge", "EdgeDistance",
"TrailDistance", "CARCOR.o", "EUOALA.o", "FAGGRA.o", "LINBEN.o",
"MALSP..o", "PRUSER.o", "ROSMUL.o", "RUBPHO.o", "VIBDEN.o", "ACERUB.t",
"FAGGRA.t", "NYSSYL.t"), class = "factor")), row.names = c(NA,
-18L), class = "data.frame")
ex1 %>%
dplyr::filter(stringr::str_detect(string = variables,
pattern = c("\\.o$", "\\.t$"),
negate = TRUE))
##output
# variables
# 1 canopy15
# 2 canopy16
# 3 DistanceToRoad
# 4 DistanceToEdge
# 5 EdgeDistance
# 6 TrailDistance
# 7 EUOALA.o
# 8 LINBEN.o
# 9 PRUSER.o
# 10 RUBPHO.o
# 11 FAGGRA.t
Solution 1:[1]
The pattern
has multiple elements, so it is recycling, and thus checking o$
for one row, and then t$
for the next row, and so on.. Try this instead:
ex1 %>%
dplyr::filter(stringr::str_detect(string = variables,
pattern = c("\\.(o|t)$"),
negate = TRUE))
Solution 2:[2]
For those not as well-versed in regular expressions, here is a simpler answer.
library(tidyverse)
ex1 %>% filter(str_detect(string = variables, pattern = ".t$", negate = TRUE),
str_detect(string = variables, pattern = ".o$", negate = TRUE))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | langtang |
Solution 2 |