'Is there an R function to escape a string for regex characters

I'm wanting to build a regex expression substituting in some strings to search for, and so these string need to be escaped before I can put them in the regex, so that if the searched for string contains regex characters it still works.

Some languages have functions that will do this for you (e.g. python re.escape: https://stackoverflow.com/a/10013356/1900520). Does R have such a function?

For example (made up function):

x = "foo[bar]"
y = escape(x) # y should now be "foo\\[bar\\]"


Solution 1:[1]

Apparently there is a function called escapeRegex in the Hmisc package. The function itself has the following definition for an input value of 'string':

gsub("([.|()\\^{}+$*?]|\\[|\\])", "\\\\\\1", string)

My previous answer:

I'm not sure if there is a built in function but you could make one to do what you want. This basically just creates a vector of the values you want to replace and a vector of what you want to replace them with and then loops through those making the necessary replacements.

re.escape <- function(strings){
    vals <- c("\\\\", "\\[", "\\]", "\\(", "\\)", 
              "\\{", "\\}", "\\^", "\\$","\\*", 
              "\\+", "\\?", "\\.", "\\|")
    replace.vals <- paste0("\\\\", vals)
    for(i in seq_along(vals)){
        strings <- gsub(vals[i], replace.vals[i], strings)
    }
    strings
}

Some output

> test.strings <- c("What the $^&(){}.*|?", "foo[bar]")
> re.escape(test.strings)
[1] "What the \\$\\^&\\(\\)\\{\\}\\.\\*\\|\\?"
[2] "foo\\[bar\\]"  

Solution 2:[2]

An easier way than @ryanthompson function is to simply prepend \\Q and postfix \\E to your string. See the help file ?base::regex.

Solution 3:[3]

Use the rex package

These days, I write all my regular expressions using rex. For your specific example, rex does exactly what you want:

library(rex)
library(assertthat)
x = "foo[bar]"
y = rex(x)
assert_that(y == "foo\\[bar\\]")

But of course, rex does a lot more than that. The question mentions building a regex, and that's exactly what rex is designed for. For example, suppose we wanted to match the exact string in x, with nothing before or after:

x = "foo[bar]"
y = rex(start, x, end)

Now y is ^foo\[bar\]$ and will only match the exact string contained in x.

Solution 4:[4]

According to ?regex:

The symbol \w matches a ‘word’ character (a synonym for [[:alnum:]_], an extension) and \W is its negation ([^[:alnum:]_]).

Therefore, using capture groups, (\\W), we can detect the occurrences of non-word characters and escape it with the \\1-syntax:

> gsub("(\\W)", "\\\\\\1", "[](){}.|^+$*?\\These are words")
[1] "\\[\\]\\(\\)\\{\\}\\.\\|\\^\\+\\$\\*\\?\\\\These\\ are\\ words"

Or similarly, replacing "([^[:alnum:]_])" for "(\\W)".

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Kalin
Solution 3 Ryan C. Thompson
Solution 4 antonio