'Extracting rows based on more than two partial strings that must all be part of the string
I want to extract rows that must contain two or more partial strings. For example, suppose I have the following data.table
df <- data.table(player = c('A', 'B', 'C', 'D', 'E', 'F', 'G'),
                 position = c('S Guard', 'P Guaaard', 'P Guard', 'S Forward',
                            'P Forward', 'Center', 'Center'),
                 points = c(28, 17, 19, 14, 23, 26, 5))
I know I can extract rows with either of the partial strings Gua or rd by using
df[grep("Gua|rd", df$position), ]
What if I want the the string to contain both Gua and rd? Tried
df[grep("Gua&rd", df$position), ]
both with and without space but doesn't work.
EDIT WITH MORE INFO
I'm actually dealing with relatively large datasets and I am using strings in one to extract data from another, something like this:
df1 <- data.frame(player = c('A', 'B', 'C', 'D', 'E', 'F', 'G'),
                 position1 = c('S Guard', 'P Guard', 'P Guard', 'S Forward',
                            'P Forward', 'Center', 'Center'),
                 position2 = c('S Grd', 'P Guard', 'P rdGua', 'S Forward',
                             'P Forward', 'Center', 'Center'))
df2 <- data.table(player = c('A', 'B', 'C', 'D', 'E', 'F', 'G'),
                 points = c(28, 17, 19, 14, 23, 26, 5))
df2[df1[,3] %like% "Gua|rd",c(1,2)]
The order therefore shouldn't matter i.e Guard and rdGua should return true if using & in place of |.
Solution 1:[1]
If you replace | with .* you get the desired result. In stead of grep (or grepl), you can also use like from data.table.
.* indicates that every character is allowed in between "Gua" and "rd".
Two options:
# option 1
df[grep("Gua.*rd", position)]
# option 2
df[like(position, "Gua.*rd")]
both give:
player position points 1: A S Guard 28 2: B P Guaaard 17 3: C P Guard 19
Because you are using the data.table-package, you don't need the df$ part in between the square brackets.
In response to the extra information, you could achieve that with:
df2[df1[[3]] %like% "(Gua.*rd)|(rd.*Gua)"]
which gives:
player points 1: B 17 2: C 19
The grep/grepl alternatieve:
df2[grepl("(Gua.*rd)|(rd.*Gua)", df1[[3]])]
Solution 2:[2]
This is probably an overkill but if you don't know the order and want to avoid having multiple grep() calls you could do:
# Helper function
grepAnd <- function(vect, strings) {
  grep(
    do.call(sprintf, c(strrep('(?=.*%s)', length(strings)), as.list(strings))), 
    vect, 
    perl = TRUE
  )
}
inp <- c('Gua', 'rd')
df[grepAnd(position, inp)]
#    player  position points
# 1:      A   S Guard     28
# 2:      B P Guaaard     17
# 3:      C   P Guard     19
# Works even though the arguments are reversed
inp <- rev(inp)
df[grepAnd(position, inp)]
#    player  position points
# 1:      A   S Guard     28
# 2:      B P Guaaard     17
# 3:      C   P Guard     19
# Can search for more than two patterns
inp <- c(inp, 'P')
df[grepAnd(position, inp)]
#    player  position points
# 1:      B P Guaaard     17
# 2:      C   P Guard     19
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source | 
|---|---|
| Solution 1 | |
| Solution 2 | sindri_baldur | 
