'Trying to be benchmark dplyr vs data.table

Why does this code not work? How can I benchmark these to expressions?

library(data.table)
library(dplyr)
dt <- as.data.table(mtcars) 

(lb <- bench::mark(
  dt[, .N, by = .(am, gear) ],
  count(dt, am, gear)
))

Error in all.equal.data.table(results$result[[1]], results$result[[i]]) : 'target' and 'current' must both be data.tables

r dplyr data.table

Solution 1:^[1]

The microbenchmark package would work very well in this situation.

library(data.table)
library(dplyr)
library(microbenchmark)

dt <- as.data.table(mtcars) 

microbenchmark::microbenchmark(
  dt = dt[, .N, by = .(am, gear) ],
  dplyr = count(dt, am, gear)
)

# Unit: microseconds
#   expr     min       lq      mean   median        uq       max neval
#     dt 366.895  441.917  666.3117  471.690  545.9255  8154.319   100
#  dplyr 934.658 1049.023 1649.7788 1144.242 1255.5120 29170.144   100

Solution 2:^[2]

I prefer to understand why the mandatory check is failing.

In this case, the differences are caused by

a different row order (data.table by = returns groups in order of appearance, count() seems to order the rows by default)
different attributes behind the scenes.

The code below fixes both issues and still checks the results:

library(data.table)
library(dplyr)
dt <- as.data.table(mtcars) 

(lb <- bench::mark(
  dt[, .N, keyby = .(am, gear)],
  count(dt, am, gear),
  check = function(x, y) all.equal(x, y, check.attributes = FALSE)
))

# A tibble: 2 × 13
  expression                         min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result              
  <bch:expr>                    <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list>              
1 dt[, .N, keyby = .(am, gear)]  617.3µs  688.1µs    1333.     33.5KB     4.17   640     2      480ms <data.table [4 × 3]>
2 count(dt, am, gear)             9.04ms   10.7ms      93.8    10.7KB     2.09    45     1      480ms <data.table [4 × 3]>
# … with 3 more variables: memory <list>, time <list>, gc <list>

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Zach Schuster
Solution 2	Uwe

'Trying to be benchmark dplyr vs data.table

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]