'Combination of all pairs of rows using R

Here is my dataset:

data <- read.table(header = TRUE, text = "
 group index group_index  x  y  z
     a     1          a1 12 13 14
     a     2          a2 15 20 22
     b     1          b1 24 17 28
     b     2          b2 12 19 30
     b     3          b3 31 32 33  ")

For each case in group "a" and each case in group "b", I wanna combine their x, y, z values in a row, so the data matrix or dataframe I want will look like:

      [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]
[1,] a1_b1   12    13    14    24    17    28  # x,y,z for a1, follows by x,y,z for b1
[2,] a1_b2   12    13    14    12    19    30  # x,y,z for a1, follows by x,y,z for b2  
[3,] a1_b3   12    13    14    31    32    33
[4,] a2_b1   15    20    22    24    17    28  # x,y,z for a2, follows by x,y,z for b1
[5,] a2_b2   15    20    22    12    19    30
[6,] a2_b3   15    20    22    31    32    33

I'm wondering how to achieve this goal? Thanks so much!



Solution 1:[1]

We can split data based on group and take a cartesian product using merge

list_df <- split(data[c("x", "y", "z")], data$group)
out <- merge(list_df[[1]], list_df[[2]], by = NULL)
out[do.call(order, out), ]

#  x.x y.x z.x x.y y.y z.y
#3  12  13  14  12  19  30
#1  12  13  14  24  17  28
#5  12  13  14  31  32  33
#4  15  20  22  12  19  30
#2  15  20  22  24  17  28
#6  15  20  22  31  32  33

Solution 2:[2]

You could also do a join on non-matching group values (< instead of != to avoid repeating pairs)

library(data.table)
setDT(data)

data[data, on = .(group < group), 
     .(g = paste0(group_index, '_', i.group_index), 
       x, y, z, i.x, i.y, i.z),  
     nomatch = NULL]

#        g  x  y  z i.x i.y i.z
# 1: a1_b1 12 13 14  24  17  28
# 2: a2_b1 15 20 22  24  17  28
# 3: a1_b2 12 13 14  12  19  30
# 4: a2_b2 15 20 22  12  19  30
# 5: a1_b3 12 13 14  31  32  33
# 6: a2_b3 15 20 22  31  32  33

Solution 3:[3]

A simple solution using dplyr:

library(tidyverse)

dcross <- left_join(data, data, by=character(), suffix=c("1", "2")) |>
    filter(group1 != group2)

#    index1 group_index1 x1 y1 index2 group_index2 x2 y2
# 1       1           a1 12 13      1           b1 24 17
# 2       1           a1 12 13      2           b2 12 19
# 3       1           a1 12 13      3           b3 31 32
# 4       2           a2 15 20      1           b1 24 17
# 5       2           a2 15 20      2           b2 12 19
# 6       2           a2 15 20      3           b3 31 32

And to get the described matrix from the dataframe

dcross |>
    select(matches("^[xyz]\\d")) |>
    as.matrix()

#       x1 y1 z1 x2 y2 z2
#  [1,] 12 13 14 24 17 28
#  [2,] 12 13 14 12 19 30
#  [3,] 12 13 14 31 32 33
#  [4,] 15 20 22 24 17 28
#  [5,] 15 20 22 12 19 30
#  [6,] 15 20 22 31 32 33

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Ronak Shah
Solution 2
Solution 3 Brendan Case