'R data.table struggling with conditional subsetting when column name is predefined elsewhere
Let's say I have a data table
library(data.table)
DT <- data.table(x=c(1,1,0,0),y=c(0,1,2,3))
column_name <- "x"
x y
1: 1 0
2: 1 1
3: 0 2
4: 0 3
And I want to access all the rows where x = 1, but by using column_name.
The desired output should behave like this:
DT[x==1,]
x y
1: 1 0
2: 1 1
but with x
replaced by column_name
in the input.
Note that this problem is similar to but not quite the same as Select subset of columns in data.table R, and the solution there (using with=FALSE) doesn't work here.
Here are all the things I've tried. None of them work.
DT[column_name ==1,]
DT[.column_name ==1,]
DT[.(column_name) ==1,]
DT[..column_name ==1,]
DT[."column_name" ==1,]
DT[,column_name ==1,]
DT[,column_name ==1,with=TRUE]
DT[,column_name ==1,with=FALSE]
DT[,.column_name ==1,with=TRUE]
DT[,.column_name ==1,with=FALSE]
DT[,..column_name ==1,with=TRUE]
DT[,..column_name ==1,with=FALSE]
DT[,."column_name" ==1,with=TRUE]
DT[,.column_name ==1,with=FALSE]
DT[column_name ==1,with=TRUE]
DT[column_name ==1,with=FALSE]
DT[[column_name==1,]]
subset(DT,column_name==1)
I also have options(datatable.WhenJisSymbolThenCallingScope=TRUE)
enabled
There's obviously some kind of lexical trick I'm missing. I've spent several hours looking through vignettes and SO questions to no avail.
Solution 1:[1]
I can imagine this was very frustrating for you. I applaud the number of things you tried before posting. Here's one approach:
DT[get(column_name) == 1,]
x y
1: 1 0
2: 1 1
If you need to use column_name
in J
, you can use get(..column_name)
:
DT[,get(..column_name)]
[1] 1 1 0 0
The ..
instructs evaluation to occur in the parent environment.
Another approach for using a string in either I
or J
is with eval(as.name(column_name))
:
DT[eval(as.name(column_name)) == 1]
x y
1: 1 0
2: 1 1
DT[,eval(as.name(column_name))]
[1] 1 1 0 0
Solution 2:[2]
You can subset the column by name and then select rows.
library(data.table)
DT[DT[[column_name]] == 1]
# x y
#1: 1 0
#2: 1 1
Solution 3:[3]
A little caveat, using get()
directly with paste0()
doesn't work. You have to assign the paste to a variable first, like:
# Doesn't work:
dt[get(paste0(column_name, 'some_string')) == 1]
# Does work:
this_col_name = paste0(column_name, 'some_string')
dt[get(this_col_name) == 1]
Solution 4:[4]
An additional answer I just discovered:
If there are multiple columns named this way and you want to return all of them, don't use get
, use mget
.
Example:
df <- data.table(x=1:4,y=1:4,z=1:4,w=1:4) # here's my data table
desired_columns <- c("y","z","w") # I want to return only columns Y, Z and W
if I try:
> df[,get(desired_columns)]
Error in get(desired_columns) : first argument has length > 1
Instead:
> df[,mget(desired_columns)]
y z w
1: 1 1 1
2: 2 2 2
3: 3 3 3
4: 4 4 4
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | |
Solution 3 | Kaleb Coberly |
Solution 4 | Ingolifs |