'How do I populate upper.tri of matrix with matched integers from the lower.tri?

Issue

I have a dataframe of familial relationships coded with integers, where R01 is the relationship of person N to person 1, R02 their relationship to person 2 etc.

However, only the lower.tri of each family matrix is coded, so I am trying to write a function to match the correct relationship in the upper.tri.

Relationships

The relationships are coded in integers as follows:

1 = Spouse, 2 = Cohabiting partner, 3 = Son/daughter, 4 = Step son/daughter, 5 = Foster child, 6 = Son-in-law/daughter-in-law, 7 = Parent/guardian, 8 = Step-parent, 9 = Foster parent, 10 = Parent-in-law, 11 = Brother/sister, 12 = Step-brother/sister, 13 = Foster brother/sister, 14 = Brother/sister-in-law, 15 = Grand-child, 16 = Grand-parent, 17 = Other relative, 18 = Other non-relative.

thus the relationships are:

rel = c("1" = 1, "2" = 2, "3" = 7, "4" = 8, "5" = 9, "6" = 10, "7" = 3, "8" = 4, "9" = 5, "10" = 6, "11" = 11, "12" = 12, "13" = 13, "14" = 14, "15" = 16, "16" = 15, "17" = 17, "18" = 18)

Example Data

   household person R01 R02 R03 R04 R05 R06
1          1      1  NA  NA  NA  NA  NA  NA
2          1      2   1  NA  NA  NA  NA  NA
3          1      3   3   3  NA  NA  NA  NA
4          1      4   3   3  11  NA  NA  NA
5          2      1  NA  NA  NA  NA  NA  NA
6          2      2   3  NA  NA  NA  NA  NA
7          2      3  15   3  NA  NA  NA  NA
8          3      1  NA  NA  NA  NA  NA  NA
9          3      2  18  NA  NA  NA  NA  NA
10         4      1  NA  NA  NA  NA  NA  NA
11         5      1  NA  NA  NA  NA  NA  NA
12         5      2   5  NA  NA  NA  NA  NA

Required Output

   household person R01 R02 R03 R04 R05 R06
1          1      1  NA   1   7   7  NA  NA
2          1      2   1  NA   7   7  NA  NA
3          1      3   3   3  NA  11  NA  NA
4          1      4   3   3  11  NA  NA  NA
5          2      1  NA   1  16  NA  NA  NA
6          2      2   3  NA   1  NA  NA  NA
7          2      3  15   3  NA  NA  NA  NA
8          3      1  NA  18  NA  NA  NA  NA
9          3      2  18  NA  NA  NA  NA  NA
10         4      1  NA  NA  NA  NA  NA  NA
11         5      1  NA   9  NA  NA  NA  NA
12         5      2   5  NA  NA  NA  NA  NA

Example Code

df <- data.frame(household = c(1,1,1,1,2,2,2,3,3,4,5,5),
                 person = c(1,2,3,4,1,2,3,1,2,1,1,2),
                 R01 = c(NA, 1, 3, 3, NA, 3, 15, NA, 18, NA, NA, 5),
                 R02 = c(NA, NA, 3, 3, NA, NA, 3, rep(NA, 5)),
                 R03 = c(rep(NA,3), 11, rep(NA, 8)),
                 R04 = rep(NA, 12),
                 R05 = rep(NA, 12),
                 R06 = rep(NA, 12))

I know it's possible to write a function to do the matrix match and then apply it to each household with dplyr, however I'm not great at functions yet so I'm running into issues in a few areas.



Solution 1:[1]

You can make the relationship matrix symmetric in each household, and at the same time recode the elements according to rel.

library(dplyr)

df %>%
  group_by(household) %>%
  group_modify(~ {
    mat <- as.matrix(.x[-1][1:nrow(.x)])
    mat[upper.tri(mat)] <- recode(t(mat)[upper.tri(mat)], !!!rel)
    cbind(.x[1], mat)
  }) %>% 
  ungroup()

# A tibble: 12 × 6
   household person   R01   R02   R03   R04
       <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
 1         1      1    NA     1     7     7
 2         1      2     1    NA     7     7
 3         1      3     3     3    NA    11
 4         1      4     3     3    11    NA
 5         2      1    NA     7    16    NA
 6         2      2     3    NA     7    NA
 7         2      3    15     3    NA    NA
 8         3      1    NA    18    NA    NA
 9         3      2    18    NA    NA    NA
10         4      1    NA    NA    NA    NA
11         5      1    NA     9    NA    NA
12         5      2     5    NA    NA    NA

Solution 2:[2]

Here's a way to do it mostly using base R.

First, create f, a function that replace the upper triangle of a matrix with the matching value from the rel vector and the lower triangle of the same matrix.

Then, split your data according to the household, compute the lengths of each group so that the resulting matrix has the right number of columns, and then apply the function to each groups. Finally, bind_rows and cbind with the original data set.

f <- function(m) {
  m[upper.tri(m)] <- match(t(m)[upper.tri(m)], rel)
  m
}

l <- split(df[3:6], df$household)
len <- lapply(l, \(l) ncol(l) - (sum(sapply(l, \(x) any(!is.na(x)))) + 1))
l <- mapply(\(x, y) x[1:(length(x) - y)], l, len, SIMPLIFY = F)

cbind(df[1:2],
      dplyr::bind_rows(lapply(l, f)))

output

   household person R01 R02 R03 R04
1          1      1  NA   1   7   7
2          1      2   1  NA   7   7
3          1      3   3   3  NA  11
4          1      4   3   3  11  NA
5          2      1  NA   7  16  NA
6          2      2   3  NA   7  NA
7          2      3  15   3  NA  NA
8          3      1  NA  18  NA  NA
9          3      2  18  NA  NA  NA
10         4      1  NA  NA  NA  NA
11         5      1  NA   9  NA  NA
12         5      2   5  NA  NA  NA

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Maël