'check whether all elements of a list are in equal in R

I have a list of several vectors. I would like to check whether all vectors in the list are equal. There's identical which only works for pairwise comparison. So I wrote the following function which looks ugly to me. Still I did not find a better solution. Here's my RE:

test_true <- list(a=c(1,2,3),b=c(1,2,3),d=c(1,2,3))
test_false <- list(a=c(1,2,3),b=c(1,2,3),d=c(1,32,13))

compareList <- function(li){
  stopifnot(length(li) > 1)
  l <- length(li)
  res <- lapply(li[-1],function(X,x) identical(X,x),x=li[[1]])
  res <- all(unlist(res))
  res
}

compareList(test_true)
compareList(test_false)

Any suggestions? Are there any native checks for identical for more than just pairwise comparison?



Solution 1:[1]

How about

allSame <- function(x) length(unique(x)) == 1

allSame(test_true)
# [1] TRUE
allSame(test_false)
# [1] FALSE

As @JoshuaUlrich pointed out below, unique may be slow on lists. Also, identical and unique may use different criteria. Reduce is a function I recently learned about for extending pairwise operations:

identicalValue <- function(x,y) if (identical(x,y)) x else FALSE
Reduce(identicalValue,test_true)
# [1] 1 2 3
Reduce(identicalValue,test_false)
# [1] FALSE

This inefficiently continues making comparisons after finding one non-match. My crude solution to that would be to write else break instead of else FALSE, throwing an error.

Solution 2:[2]

I woud do:

all.identical <- function(l) all(mapply(identical, head(l, 1), tail(l, -1)))

all.identical(test_true)
# [1] TRUE
all.identical(test_false)
# [1] FALSE

Solution 3:[3]

To summarize the solutions. Data for the tests:

x1 <- as.list(as.data.frame(replicate(1000, 1:100)))
x2 <- as.list(as.data.frame(replicate(1000, sample(1:100, 100))))

Solutions:

comp_list1 <- function(x) length(unique.default(x)) == 1L
comp_list2 <- function(x) all(vapply(x[-1], identical, logical(1L), x = x[[1]]))
comp_list3 <- function(x) all(vapply(x[-1], function(x2) all(x[[1]] == x2), logical(1L)))
comp_list4 <- function(x) sum(duplicated.default(x)) == length(x) - 1L

Test on the data:

for (i in 1:4) cat(match.fun(paste0("comp_list", i))(x1), " ")
#> TRUE  TRUE  TRUE  TRUE   
for (i in 1:4) cat(match.fun(paste0("comp_list", i))(x2), " ")
#> FALSE  FALSE  FALSE  FALSE  

Benchmarks:

library(microbenchmark)
microbenchmark(comp_list1(x1), comp_list2(x1), comp_list3(x1), comp_list4(x1))
#> Unit: microseconds
#>            expr      min        lq      mean   median        uq      max neval cld
#>  comp_list1(x1)  138.327  148.5955  171.9481  162.013  188.9315  269.342   100 a  
#>  comp_list2(x1) 1023.932 1125.2210 1387.6268 1255.985 1403.1885 3458.597   100  b 
#>  comp_list3(x1) 1130.275 1275.9940 1511.7916 1378.789 1550.8240 3254.292   100   c
#>  comp_list4(x1)  138.075  144.8635  169.7833  159.954  185.1515  298.282   100 a  
microbenchmark(comp_list1(x2), comp_list2(x2), comp_list3(x2), comp_list4(x2))
#> Unit: microseconds
#>            expr     min        lq      mean   median        uq      max neval cld
#>  comp_list1(x2) 139.492  140.3540  147.7695  145.380  149.6495  218.800   100  a 
#>  comp_list2(x2) 995.373 1030.4325 1179.2274 1054.711 1136.5050 3763.506   100   b
#>  comp_list3(x2) 977.805 1029.7310 1134.3650 1049.684 1086.0730 2846.592   100   b
#>  comp_list4(x2) 135.516  136.4685  150.7185  139.030  146.7170  345.985   100  a

As we see the most efficient solutions based on the duplicated and unique functions.

Solution 4:[4]

PUtting in my self-promoting suggestion for cgwtools::approxeq which essentially does what all.equal does but returns a vector of logical values indicating equality or not.

So: depends whether you want exact equality or floating-point-representational equality.

Solution 5:[5]

UPDATE

The overall best solution:

all.identical.list <- function(l) identical(unname(l[-length(l)]), unname(l[-1]))

Implementing Frank's solution with a break:

all.identical <- function(l) class(try(Reduce(function(x, y) if(identical(x, y)) x else break, l), silent = TRUE)) != "try-error"

Continuing with Artem's benchmarking and adding the solution from Jake's comment, speeds are pretty dependent on the objects being compared, but all.identical.list is consistently the fastest (or very close to fastest):

library(microbenchmark)

all.identical.list <- function(l) identical(unname(l[-length(l)]), unname(l[-1]))
all.identical <- function(l) !is.null(Reduce(function(x, y) if(identical(x, y)) x else NULL, l))
all.identical.break <- function(l) class(try(Reduce(function(x, y) if(identical(x, y)) x else break, l), silent = TRUE)) != "try-error"
comp_list4 <- function(l) sum(duplicated.default(l)) == length(l) - 1L
comp_list5 <- function(l) all(duplicated.default(l)[-1])

x1 <- as.list(as.data.frame(replicate(1000, 1:100)))
x2 <- as.list(as.data.frame(replicate(1000, sample(100))))
microbenchmark(all.identical.list(x1), all.identical(x1), all.identical.break(x1), comp_list4(x1), comp_list5(x1), check = "equal")
#> Unit: microseconds
#>                     expr    min      lq     mean  median      uq    max neval
#>   all.identical.list(x1)   60.3   66.65  125.803   72.90   94.30 3271.5   100
#>        all.identical(x1) 1134.0 1209.45 1484.864 1265.85 1655.95 5085.3   100
#>  all.identical.break(x1) 1156.6 1226.75 1602.869 1337.25 1698.05 5030.4   100
#>           comp_list4(x1)  170.5  179.35  234.169  184.75  200.40 2164.1   100
#>           comp_list5(x1)  173.3  182.35  213.542  187.55  194.50 1704.0   100
microbenchmark(all.identical.list(x2), all.identical(x2), all.identical.break(x2), comp_list4(x2), comp_list5(x2), check = "equal")
#> Unit: microseconds
#>                     expr    min      lq     mean  median      uq    max neval
#>   all.identical.list(x2)   31.0   34.30   47.182   37.65   46.90  180.8   100
#>        all.identical(x2) 1002.8 1059.85 1237.426 1106.65 1278.35 3404.4   100
#>  all.identical.break(x2)  119.4  137.15  156.748  147.60  164.00  340.8   100
#>           comp_list4(x2)  165.0  172.35  189.869  181.20  192.25  334.6   100
#>           comp_list5(x2)  166.6  171.10  188.782  179.25  190.55  394.9   100
x1 <- as.list(as.data.frame(replicate(10, 1:1e5)))
x2 <- as.list(as.data.frame(replicate(10, sample(1e5))))
microbenchmark(all.identical.list(x1), all.identical(x1), all.identical.break(x1), comp_list4(x1), comp_list5(x1), check = "equal")
#> Unit: microseconds
#>                     expr    min      lq     mean median      uq    max neval
#>   all.identical.list(x1)  211.4  217.25  264.978  229.5  258.00  711.4   100
#>        all.identical(x1)  182.2  187.50  218.062  195.3  217.05  499.4   100
#>  all.identical.break(x1)  194.8  207.25  258.043  222.7  266.70 1013.4   100
#>           comp_list4(x1) 1457.3 1495.30 1659.118 1543.0 1806.75 2689.0   100
#>           comp_list5(x1) 1457.7 1502.45 1685.194 1553.5 1769.10 3021.2   100
microbenchmark(all.identical.list(x2), all.identical(x2), all.identical.break(x2), comp_list4(x2), comp_list5(x2), check = "equal")
#> Unit: microseconds
#>                     expr    min      lq     mean  median      uq    max neval
#>   all.identical.list(x2)    3.1    4.45    7.894    6.35    9.85   48.5   100
#>        all.identical(x2)   12.0   15.25   19.404   17.05   22.05   56.1   100
#>  all.identical.break(x2)  114.3  128.80  172.876  144.90  190.45  511.5   100
#>           comp_list4(x2) 1292.2 1342.35 1443.261 1397.00 1472.25 1908.5   100
#>           comp_list5(x2) 1292.4 1364.90 1478.291 1409.50 1484.80 2467.2   100

Solution 6:[6]

this also works

m <- combn(length(test_true),2)

for(i in 1:ncol(m)){
    print(all(test_true[[m[,i][1]]] == test_true[[m[,i][2]]]))
    }

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 flodel
Solution 3
Solution 4 Carl Witthoft
Solution 5
Solution 6