'Using setDT inside a function
I'm writing a function that, among other things, coerces the input into a data.table.
library(data.table)
df <- data.frame(id = 1:10)
f <- function(df){setDT(df)}
f(df)
df[, temp := 1]
However, the last command outputs the following warning:
Warning message: In
[.data.table
(df, ,:=
(temp, 1)) : Invalid .internal.selfref detected and fixed by taking a copy of the whole table so that := can add this new column by reference. At an earlier point, this data.table has been copied by R (or been created manually using structure() or similar). Avoid key<-, names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: ?set, ?setnames and ?setattr. Also, in R<=v3.0.2, list(DT1,DT2) copied the entire DT1 and DT2 (R's list() used to copy named objects); please upgrade to R>v3.0.2 if that is biting. If this message doesn't help, please report to datatable-help so the root cause can be fixed.
I'm using v1.9.3 of data.table and R 3.1.1. Does it mean df
is copied at some point? How to avoid this warning?
Edit:
The code of setDT
actually uses NSE. So this seems to work:
df1 <- data.frame(id = 1:10)
f <- function(df){eval(substitute(setDT(df)),parent.frame())}
f(df1)
df1[, temp := 1]
It seems I can do other stuffs with df within the function f
like
df1 <- data.frame(id = 1:10)
f <- function(df){
eval(substitute(setDT(df)),parent.frame())
df[, temp := 1]
}
f(df1)
Is this the right way to do it?
Solution 1:[1]
Great question! The warning message should say: ... and fixed by taking a shallow copy of the whole table .... Will fix this.
setDT
does two things:
- set the class to
data.table
fromdata.frame
/list
- use
alloc.col
to over-allocate columns (so that:=
can be used directly)
And the 2nd step requires a shallow copy, if the input is not a data.table
already. And this is why we assign the value back to the symbol in it's environment (setDT's parent frame). But the parent frame for setDT
is your function f()
. Therefore the setDT(df)
within your function has gone through smoothly, but the df
that resides in the global environment will only have it's class changed, not the over-allocation (as the shallow copy severed the link).
And in the next step, :=
detects that and shallow copies once again to over-allocate.
The idea so far is to use setDT
to convert to data.tables before providing it to a function. But I'd like that these cases be resolved (will take a look).
Thanks a bunch!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Axeman |