'grouped data frame to list
I've got a data frame that contains names that are grouped, like so:
df <- data.frame(group = rep(letters[1:2], each=2),
name = LETTERS[1:4])
> df
group name
1 a A
2 a B
3 b C
4 b D
I would like to convert this into a list that is keyed on the group names and contains the names. Example output:
df_out <- list(a=c('A', 'B'),
b=c('C', 'D'))
> df_out
$a
[1] "A" "B"
$b
[1] "C" "D"
This is not a new question but I would like to do this wholly within the tidyverse.
Solution 1:[1]
There is no such function yet in the tidyverse as far as I know. Thus, you will have to write your own:
split_tibble <- function(tibble, col = 'col') tibble %>% split(., .[, col])
Then:
dflist <- split_tibble(df, 'group')
results in a list of dataframes:
> dflist $a group name 1 a A 2 a B $b group name 3 b C 4 b D > sapply(dflist, class) a b "data.frame" "data.frame"
To get the desired output, you'll have to extend the function a bit:
split_tibble <- function(tibble, column = 'col') {
tibble %>% split(., .[,column]) %>% lapply(., function(x) x[,setdiff(names(x),column)])
}
Now:
split_tibble(df, 'group')
results in:
$a [1] A B Levels: A B C D $b [1] C D Levels: A B C D
Considering the alternatives in the comments and both answers, leads to the following conclusion: using the base R alternative split(df$name, df$group)
is much wiser.
Solution 2:[2]
Use tidyverse
library(tidyr)
library(dplyr)
df$ID <- 1:nrow(df) # unique variable
lst <- df %>% spread(group, name) %>% select(-ID) %>% as.list()
lapply(lst, function(x) x[!is.na(x)])
Solution 3:[3]
With the new group_map
function of dplyr, you can write
library(dplyr)
lst <- df %>% group_by(group) %>%
group_map(~.x)
The group_map
function takes a formula which it applies to each group and returns the result as list. The function can be specified as a formula - and the identity formula is then ~.x
.
It is complicated to get the result as a named list using the above method because there could be multiple grouping columns. Therefore, it is currently easier to define it yourself;
names(lst) <- df %>% group_by(group) %>%
group_map(~.y) %>%
unlist()
In the formula-language .y
means the key (which in your case will be either "a"
or "b"
).
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | merv |
Solution 3 |