'check whether all elements of a list are in equal in R
I have a list of several vectors. I would like to check whether all vectors in the list are equal. There's identical
which only works for pairwise comparison. So I wrote the following function which looks ugly to me. Still I did not find a better solution. Here's my RE:
test_true <- list(a=c(1,2,3),b=c(1,2,3),d=c(1,2,3))
test_false <- list(a=c(1,2,3),b=c(1,2,3),d=c(1,32,13))
compareList <- function(li){
stopifnot(length(li) > 1)
l <- length(li)
res <- lapply(li[-1],function(X,x) identical(X,x),x=li[[1]])
res <- all(unlist(res))
res
}
compareList(test_true)
compareList(test_false)
Any suggestions? Are there any native checks for identical for more than just pairwise comparison?
Solution 1:[1]
How about
allSame <- function(x) length(unique(x)) == 1
allSame(test_true)
# [1] TRUE
allSame(test_false)
# [1] FALSE
As @JoshuaUlrich pointed out below, unique
may be slow on lists. Also, identical
and unique
may use different criteria. Reduce
is a function I recently learned about for extending pairwise operations:
identicalValue <- function(x,y) if (identical(x,y)) x else FALSE
Reduce(identicalValue,test_true)
# [1] 1 2 3
Reduce(identicalValue,test_false)
# [1] FALSE
This inefficiently continues making comparisons after finding one non-match. My crude solution to that would be to write else break
instead of else FALSE
, throwing an error.
Solution 2:[2]
I woud do:
all.identical <- function(l) all(mapply(identical, head(l, 1), tail(l, -1)))
all.identical(test_true)
# [1] TRUE
all.identical(test_false)
# [1] FALSE
Solution 3:[3]
To summarize the solutions. Data for the tests:
x1 <- as.list(as.data.frame(replicate(1000, 1:100)))
x2 <- as.list(as.data.frame(replicate(1000, sample(1:100, 100))))
Solutions:
comp_list1 <- function(x) length(unique.default(x)) == 1L
comp_list2 <- function(x) all(vapply(x[-1], identical, logical(1L), x = x[[1]]))
comp_list3 <- function(x) all(vapply(x[-1], function(x2) all(x[[1]] == x2), logical(1L)))
comp_list4 <- function(x) sum(duplicated.default(x)) == length(x) - 1L
Test on the data:
for (i in 1:4) cat(match.fun(paste0("comp_list", i))(x1), " ")
#> TRUE TRUE TRUE TRUE
for (i in 1:4) cat(match.fun(paste0("comp_list", i))(x2), " ")
#> FALSE FALSE FALSE FALSE
Benchmarks:
library(microbenchmark)
microbenchmark(comp_list1(x1), comp_list2(x1), comp_list3(x1), comp_list4(x1))
#> Unit: microseconds
#> expr min lq mean median uq max neval cld
#> comp_list1(x1) 138.327 148.5955 171.9481 162.013 188.9315 269.342 100 a
#> comp_list2(x1) 1023.932 1125.2210 1387.6268 1255.985 1403.1885 3458.597 100 b
#> comp_list3(x1) 1130.275 1275.9940 1511.7916 1378.789 1550.8240 3254.292 100 c
#> comp_list4(x1) 138.075 144.8635 169.7833 159.954 185.1515 298.282 100 a
microbenchmark(comp_list1(x2), comp_list2(x2), comp_list3(x2), comp_list4(x2))
#> Unit: microseconds
#> expr min lq mean median uq max neval cld
#> comp_list1(x2) 139.492 140.3540 147.7695 145.380 149.6495 218.800 100 a
#> comp_list2(x2) 995.373 1030.4325 1179.2274 1054.711 1136.5050 3763.506 100 b
#> comp_list3(x2) 977.805 1029.7310 1134.3650 1049.684 1086.0730 2846.592 100 b
#> comp_list4(x2) 135.516 136.4685 150.7185 139.030 146.7170 345.985 100 a
As we see the most efficient solutions based on the duplicated
and unique
functions.
Solution 4:[4]
PUtting in my self-promoting suggestion for cgwtools::approxeq
which essentially does what all.equal
does but returns a vector of logical values indicating equality or not.
So: depends whether you want exact equality or floating-point-representational equality.
Solution 5:[5]
UPDATE
The overall best solution:
all.identical.list <- function(l) identical(unname(l[-length(l)]), unname(l[-1]))
Implementing Frank's solution with a break:
all.identical <- function(l) class(try(Reduce(function(x, y) if(identical(x, y)) x else break, l), silent = TRUE)) != "try-error"
Continuing with Artem's benchmarking and adding the solution from Jake's comment, speeds are pretty dependent on the objects being compared, but all.identical.list
is consistently the fastest (or very close to fastest):
library(microbenchmark)
all.identical.list <- function(l) identical(unname(l[-length(l)]), unname(l[-1]))
all.identical <- function(l) !is.null(Reduce(function(x, y) if(identical(x, y)) x else NULL, l))
all.identical.break <- function(l) class(try(Reduce(function(x, y) if(identical(x, y)) x else break, l), silent = TRUE)) != "try-error"
comp_list4 <- function(l) sum(duplicated.default(l)) == length(l) - 1L
comp_list5 <- function(l) all(duplicated.default(l)[-1])
x1 <- as.list(as.data.frame(replicate(1000, 1:100)))
x2 <- as.list(as.data.frame(replicate(1000, sample(100))))
microbenchmark(all.identical.list(x1), all.identical(x1), all.identical.break(x1), comp_list4(x1), comp_list5(x1), check = "equal")
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> all.identical.list(x1) 60.3 66.65 125.803 72.90 94.30 3271.5 100
#> all.identical(x1) 1134.0 1209.45 1484.864 1265.85 1655.95 5085.3 100
#> all.identical.break(x1) 1156.6 1226.75 1602.869 1337.25 1698.05 5030.4 100
#> comp_list4(x1) 170.5 179.35 234.169 184.75 200.40 2164.1 100
#> comp_list5(x1) 173.3 182.35 213.542 187.55 194.50 1704.0 100
microbenchmark(all.identical.list(x2), all.identical(x2), all.identical.break(x2), comp_list4(x2), comp_list5(x2), check = "equal")
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> all.identical.list(x2) 31.0 34.30 47.182 37.65 46.90 180.8 100
#> all.identical(x2) 1002.8 1059.85 1237.426 1106.65 1278.35 3404.4 100
#> all.identical.break(x2) 119.4 137.15 156.748 147.60 164.00 340.8 100
#> comp_list4(x2) 165.0 172.35 189.869 181.20 192.25 334.6 100
#> comp_list5(x2) 166.6 171.10 188.782 179.25 190.55 394.9 100
x1 <- as.list(as.data.frame(replicate(10, 1:1e5)))
x2 <- as.list(as.data.frame(replicate(10, sample(1e5))))
microbenchmark(all.identical.list(x1), all.identical(x1), all.identical.break(x1), comp_list4(x1), comp_list5(x1), check = "equal")
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> all.identical.list(x1) 211.4 217.25 264.978 229.5 258.00 711.4 100
#> all.identical(x1) 182.2 187.50 218.062 195.3 217.05 499.4 100
#> all.identical.break(x1) 194.8 207.25 258.043 222.7 266.70 1013.4 100
#> comp_list4(x1) 1457.3 1495.30 1659.118 1543.0 1806.75 2689.0 100
#> comp_list5(x1) 1457.7 1502.45 1685.194 1553.5 1769.10 3021.2 100
microbenchmark(all.identical.list(x2), all.identical(x2), all.identical.break(x2), comp_list4(x2), comp_list5(x2), check = "equal")
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> all.identical.list(x2) 3.1 4.45 7.894 6.35 9.85 48.5 100
#> all.identical(x2) 12.0 15.25 19.404 17.05 22.05 56.1 100
#> all.identical.break(x2) 114.3 128.80 172.876 144.90 190.45 511.5 100
#> comp_list4(x2) 1292.2 1342.35 1443.261 1397.00 1472.25 1908.5 100
#> comp_list5(x2) 1292.4 1364.90 1478.291 1409.50 1484.80 2467.2 100
Solution 6:[6]
this also works
m <- combn(length(test_true),2)
for(i in 1:ncol(m)){
print(all(test_true[[m[,i][1]]] == test_true[[m[,i][2]]]))
}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | flodel |
Solution 3 | |
Solution 4 | Carl Witthoft |
Solution 5 | |
Solution 6 |