'R data.table struggling with conditional subsetting when column name is predefined elsewhere

Let's say I have a data table

library(data.table)
DT <- data.table(x=c(1,1,0,0),y=c(0,1,2,3))
column_name <- "x"

   x y
1: 1 0
2: 1 1
3: 0 2
4: 0 3

And I want to access all the rows where x = 1, but by using column_name.

The desired output should behave like this:

DT[x==1,]
   x y
1: 1 0
2: 1 1

but with x replaced by column_name in the input.

Note that this problem is similar to but not quite the same as Select subset of columns in data.table R, and the solution there (using with=FALSE) doesn't work here.

Here are all the things I've tried. None of them work.

DT[column_name ==1,]
DT[.column_name ==1,]
DT[.(column_name) ==1,]
DT[..column_name ==1,]
DT[."column_name" ==1,]
DT[,column_name ==1,]
DT[,column_name ==1,with=TRUE]
DT[,column_name ==1,with=FALSE]
DT[,.column_name ==1,with=TRUE]
DT[,.column_name ==1,with=FALSE]
DT[,..column_name ==1,with=TRUE]
DT[,..column_name ==1,with=FALSE]
DT[,."column_name" ==1,with=TRUE]
DT[,.column_name ==1,with=FALSE]
DT[column_name ==1,with=TRUE]
DT[column_name ==1,with=FALSE]
DT[[column_name==1,]]
subset(DT,column_name==1)

I also have options(datatable.WhenJisSymbolThenCallingScope=TRUE) enabled

There's obviously some kind of lexical trick I'm missing. I've spent several hours looking through vignettes and SO questions to no avail.



Solution 1:[1]

I can imagine this was very frustrating for you. I applaud the number of things you tried before posting. Here's one approach:

DT[get(column_name) == 1,]
   x y
1: 1 0
2: 1 1

If you need to use column_name in J, you can use get(..column_name):

DT[,get(..column_name)]
[1] 1 1 0 0

The .. instructs evaluation to occur in the parent environment.

Another approach for using a string in either I or J is with eval(as.name(column_name)):

DT[eval(as.name(column_name)) == 1]
   x y
1: 1 0
2: 1 1

DT[,eval(as.name(column_name))]
[1] 1 1 0 0

Solution 2:[2]

You can subset the column by name and then select rows.

library(data.table)
DT[DT[[column_name]] == 1]

#   x y
#1: 1 0
#2: 1 1

Solution 3:[3]

A little caveat, using get() directly with paste0() doesn't work. You have to assign the paste to a variable first, like:

# Doesn't work:
dt[get(paste0(column_name, 'some_string')) == 1]

# Does work:
this_col_name = paste0(column_name, 'some_string')
dt[get(this_col_name) == 1]

Solution 4:[4]

An additional answer I just discovered: If there are multiple columns named this way and you want to return all of them, don't use get, use mget.

Example:

df <- data.table(x=1:4,y=1:4,z=1:4,w=1:4)    # here's my data table
desired_columns <- c("y","z","w")            # I want to return only columns Y, Z and W

if I try:

> df[,get(desired_columns)]
Error in get(desired_columns) : first argument has length > 1

Instead:

> df[,mget(desired_columns)]
   y z w
1: 1 1 1
2: 2 2 2
3: 3 3 3
4: 4 4 4

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3 Kaleb Coberly
Solution 4 Ingolifs