'grouped data frame to list

I've got a data frame that contains names that are grouped, like so:

df <- data.frame(group = rep(letters[1:2], each=2),
                 name = LETTERS[1:4])
> df
  group name
1     a    A
2     a    B
3     b    C
4     b    D

I would like to convert this into a list that is keyed on the group names and contains the names. Example output:

df_out <- list(a=c('A', 'B'),
               b=c('C', 'D'))

> df_out
$a
[1] "A" "B"

$b
[1] "C" "D"

This is not a new question but I would like to do this wholly within the tidyverse.



Solution 1:[1]

There is no such function yet in the tidyverse as far as I know. Thus, you will have to write your own:

split_tibble <- function(tibble, col = 'col') tibble %>% split(., .[, col])

Then:

dflist <- split_tibble(df, 'group')

results in a list of dataframes:

> dflist
$a
  group name
1     a    A
2     a    B

$b
  group name
3     b    C
4     b    D

> sapply(dflist, class)
           a            b 
"data.frame" "data.frame"

To get the desired output, you'll have to extend the function a bit:

split_tibble <- function(tibble, column = 'col') {
  tibble %>% split(., .[,column]) %>% lapply(., function(x) x[,setdiff(names(x),column)])
}

Now:

split_tibble(df, 'group')

results in:

$a
[1] A B
Levels: A B C D

$b
[1] C D
Levels: A B C D

Considering the alternatives in the comments and both answers, leads to the following conclusion: using the base R alternative split(df$name, df$group) is much wiser.

Solution 2:[2]

Use tidyverse

library(tidyr)
library(dplyr)

df$ID <- 1:nrow(df)  # unique variable
lst <- df %>% spread(group, name) %>% select(-ID) %>% as.list()
lapply(lst, function(x) x[!is.na(x)])

Solution 3:[3]

With the new group_map function of dplyr, you can write

library(dplyr)
lst <- df %>% group_by(group) %>% 
              group_map(~.x)

The group_map function takes a formula which it applies to each group and returns the result as list. The function can be specified as a formula - and the identity formula is then ~.x.

It is complicated to get the result as a named list using the above method because there could be multiple grouping columns. Therefore, it is currently easier to define it yourself;

names(lst) <- df %>% group_by(group) %>%
                     group_map(~.y) %>%
                     unlist()

In the formula-language .y means the key (which in your case will be either "a" or "b").

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 merv
Solution 3