'Extracting rows based on more than two partial strings that must all be part of the string

I want to extract rows that must contain two or more partial strings. For example, suppose I have the following data.table

df <- data.table(player = c('A', 'B', 'C', 'D', 'E', 'F', 'G'),
                 position = c('S Guard', 'P Guaaard', 'P Guard', 'S Forward',
                            'P Forward', 'Center', 'Center'),
                 points = c(28, 17, 19, 14, 23, 26, 5))

I know I can extract rows with either of the partial strings Gua or rd by using

df[grep("Gua|rd", df$position), ]

What if I want the the string to contain both Gua and rd? Tried

df[grep("Gua&rd", df$position), ]

both with and without space but doesn't work.

EDIT WITH MORE INFO

I'm actually dealing with relatively large datasets and I am using strings in one to extract data from another, something like this:

df1 <- data.frame(player = c('A', 'B', 'C', 'D', 'E', 'F', 'G'),
                 position1 = c('S Guard', 'P Guard', 'P Guard', 'S Forward',
                            'P Forward', 'Center', 'Center'),
                 position2 = c('S Grd', 'P Guard', 'P rdGua', 'S Forward',
                             'P Forward', 'Center', 'Center'))

df2 <- data.table(player = c('A', 'B', 'C', 'D', 'E', 'F', 'G'),
                 points = c(28, 17, 19, 14, 23, 26, 5))

df2[df1[,3] %like% "Gua|rd",c(1,2)]

The order therefore shouldn't matter i.e Guard and rdGua should return true if using & in place of |.



Solution 1:[1]

If you replace | with .* you get the desired result. In stead of grep (or grepl), you can also use like from .

.* indicates that every character is allowed in between "Gua" and "rd".

Two options:

# option 1
df[grep("Gua.*rd", position)]

# option 2
df[like(position, "Gua.*rd")]

both give:

   player  position points
1:      A   S Guard     28
2:      B P Guaaard     17
3:      C   P Guard     19

Because you are using the -package, you don't need the df$ part in between the square brackets.


In response to the extra information, you could achieve that with:

df2[df1[[3]] %like% "(Gua.*rd)|(rd.*Gua)"]

which gives:

   player points
1:      B     17
2:      C     19

The grep/grepl alternatieve:

df2[grepl("(Gua.*rd)|(rd.*Gua)", df1[[3]])]

Solution 2:[2]

This is probably an overkill but if you don't know the order and want to avoid having multiple grep() calls you could do:

# Helper function
grepAnd <- function(vect, strings) {
  grep(
    do.call(sprintf, c(strrep('(?=.*%s)', length(strings)), as.list(strings))), 
    vect, 
    perl = TRUE
  )
}

inp <- c('Gua', 'rd')
df[grepAnd(position, inp)]

#    player  position points
# 1:      A   S Guard     28
# 2:      B P Guaaard     17
# 3:      C   P Guard     19

# Works even though the arguments are reversed
inp <- rev(inp)
df[grepAnd(position, inp)]
#    player  position points
# 1:      A   S Guard     28
# 2:      B P Guaaard     17
# 3:      C   P Guard     19


# Can search for more than two patterns
inp <- c(inp, 'P')
df[grepAnd(position, inp)]
#    player  position points
# 1:      B P Guaaard     17
# 2:      C   P Guard     19

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 sindri_baldur