'Combination of all pairs of rows using R
Here is my dataset:
data <- read.table(header = TRUE, text = "
group index group_index x y z
a 1 a1 12 13 14
a 2 a2 15 20 22
b 1 b1 24 17 28
b 2 b2 12 19 30
b 3 b3 31 32 33 ")
For each case in group "a" and each case in group "b", I wanna combine their x, y, z values in a row, so the data matrix or dataframe I want will look like:
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] a1_b1 12 13 14 24 17 28 # x,y,z for a1, follows by x,y,z for b1
[2,] a1_b2 12 13 14 12 19 30 # x,y,z for a1, follows by x,y,z for b2
[3,] a1_b3 12 13 14 31 32 33
[4,] a2_b1 15 20 22 24 17 28 # x,y,z for a2, follows by x,y,z for b1
[5,] a2_b2 15 20 22 12 19 30
[6,] a2_b3 15 20 22 31 32 33
I'm wondering how to achieve this goal? Thanks so much!
Solution 1:[1]
We can split
data
based on group
and take a cartesian product using merge
list_df <- split(data[c("x", "y", "z")], data$group)
out <- merge(list_df[[1]], list_df[[2]], by = NULL)
out[do.call(order, out), ]
# x.x y.x z.x x.y y.y z.y
#3 12 13 14 12 19 30
#1 12 13 14 24 17 28
#5 12 13 14 31 32 33
#4 15 20 22 12 19 30
#2 15 20 22 24 17 28
#6 15 20 22 31 32 33
Solution 2:[2]
You could also do a join on non-matching group
values (< instead of != to avoid repeating pairs)
library(data.table)
setDT(data)
data[data, on = .(group < group),
.(g = paste0(group_index, '_', i.group_index),
x, y, z, i.x, i.y, i.z),
nomatch = NULL]
# g x y z i.x i.y i.z
# 1: a1_b1 12 13 14 24 17 28
# 2: a2_b1 15 20 22 24 17 28
# 3: a1_b2 12 13 14 12 19 30
# 4: a2_b2 15 20 22 12 19 30
# 5: a1_b3 12 13 14 31 32 33
# 6: a2_b3 15 20 22 31 32 33
Solution 3:[3]
A simple solution using dplyr
:
library(tidyverse)
dcross <- left_join(data, data, by=character(), suffix=c("1", "2")) |>
filter(group1 != group2)
# index1 group_index1 x1 y1 index2 group_index2 x2 y2
# 1 1 a1 12 13 1 b1 24 17
# 2 1 a1 12 13 2 b2 12 19
# 3 1 a1 12 13 3 b3 31 32
# 4 2 a2 15 20 1 b1 24 17
# 5 2 a2 15 20 2 b2 12 19
# 6 2 a2 15 20 3 b3 31 32
And to get the described matrix from the dataframe
dcross |>
select(matches("^[xyz]\\d")) |>
as.matrix()
# x1 y1 z1 x2 y2 z2
# [1,] 12 13 14 24 17 28
# [2,] 12 13 14 12 19 30
# [3,] 12 13 14 31 32 33
# [4,] 15 20 22 24 17 28
# [5,] 15 20 22 12 19 30
# [6,] 15 20 22 31 32 33
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Ronak Shah |
Solution 2 | |
Solution 3 | Brendan Case |