'R - Nested list to (wide) dataframe
I currently have the following problem: I extracted some data via the crunchbase API, resulting in a big nested list of the following structure (there are many more nested lists on several instances included, I here only display the part of the structure currently relevant for me):
> str(x[[1]])
$ uuid : chr "5f9957b0841251e6e439d757XXXXXX"
$ relationships: List of 27
..$ websites: List of 3
.. ..$ cardinality: chr "OneToMany"
.. ..$ items :'data.frame': 4 obs. of 7 variables:
.. .. ..$ properties.website_type: chr [1:4] "homepage" "facebook" "twitter" "linkedin"
.. .. ..$ properties.url : chr [1:4] "http://www.example.com" "https://www.facebook.com/example" "http://twitter.com/example" "http://www.linkedin.com/company/example"
Consider the following minimal example:
x <- list()
x[[1]] <- list(uuid = "123",
relationships = list(websites = list(items = list(
properties.website_type = c("homepage", "facebook", "twitter", "linkedin"),
properties.url = c("www.example1.com", "www.fbex1.com", "www.twitterex1.com", "www.linkedinex1.com") ) ) ) )
x[[2]] <- list(uuid = "987",
relationships = list(websites = list(items = list(
properties.website_type = c("homepage", "facebook", "twitter" ),
properties.url = c("www.example2.com", "www.fbex2.com", "www.twitterex2.com") ) ) ) )
Now, I would like to create a dataframe with the following column structure:
> x.df
uuid web.url web.facebook web.twitter web.linkedin
1 123 www.example1.com www.fbex1.com www.twitterex1.com www.linkedinex1.com
2 987 www.example2.com www.fbex2.com www.twitterex2.com <NA>
Meaning: I would like to have every uuid (a unique firm identifier) in a single column, followed by the urls of the different platforms (fb, twitter...). I tried a lot of different things with a combination of lapply()
, spread()
, and row_bind()
, yet didn't manage to make anything work. Any help on that would be appreciated.
Solution 1:[1]
dplyr
approach could be
library(dplyr)
library(tidyr)
#convert list to dataframe in long format
df <- do.call(rbind, lapply(x, data.frame, stringsAsFactors = FALSE))
#final result
df1 <- df %>%
spread(relationships.websites.items.properties.website_type, relationships.websites.items.properties.url)
which gives
uuid facebook homepage linkedin twitter
1 123 www.fbex1.com www.example1.com www.linkedinex1.com www.twitterex1.com
2 987 www.fbex2.com www.example2.com <NA> www.twitterex2.com
Sample data:
x <- list(structure(list(uuid = "123", relationships = structure(list(
websites = structure(list(items = structure(list(properties.website_type = c("homepage",
"facebook", "twitter", "linkedin"), properties.url = c("www.example1.com",
"www.fbex1.com", "www.twitterex1.com", "www.linkedinex1.com"
)), .Names = c("properties.website_type", "properties.url"
))), .Names = "items")), .Names = "websites")), .Names = c("uuid",
"relationships")), structure(list(uuid = "987", relationships = structure(list(
websites = structure(list(items = structure(list(properties.website_type = c("homepage",
"facebook", "twitter"), properties.url = c("www.example2.com",
"www.fbex2.com", "www.twitterex2.com")), .Names = c("properties.website_type",
"properties.url"))), .Names = "items")), .Names = "websites")), .Names = c("uuid",
"relationships")))
Update: In order to fix below error
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 1, 0
you would need to remove corrupted elements from input data where website_type
has one value but properties.url
has NULL
. Run this chunk of code as a pre-processing step before executing the main solution:
idx <- which(sapply(x, function(k) is.null(k$relationships$websites$items$properties.url)))
x <- x[-idx]
Sample data to test this pre-processing step:
x <- list(structure(list(uuid = "123", relationships = structure(list(
websites = structure(list(items = structure(list(properties.website_type = c("homepage",
"facebook", "twitter", "linkedin"), properties.url = c("www.example1.com",
"www.fbex1.com", "www.twitterex1.com", "www.linkedinex1.com"
)), .Names = c("properties.website_type", "properties.url"
))), .Names = "items")), .Names = "websites")), .Names = c("uuid",
"relationships")), structure(list(uuid = "987", relationships = structure(list(
websites = structure(list(items = structure(list(properties.website_type = "homepage",
properties.url = NULL), .Names = c("properties.website_type",
"properties.url"))), .Names = "items")), .Names = "websites")), .Names = c("uuid",
"relationships")), structure(list(uuid = "345", relationships = structure(list(
websites = structure(list(items = structure(list(properties.website_type = "homepage",
properties.url = NULL), .Names = c("properties.website_type",
"properties.url"))), .Names = "items")), .Names = "websites")), .Names = c("uuid",
"relationships")))
Solution 2:[2]
I know this is a clunkier solution, but it helped me seeing the process step by step (running str (x_df)
to see each result):
library(tidyverse)
# Using your example
x <- list()
x[[1]] <- list(uuid = "123",
relationships = list(websites = list(items = list(
properties.website_type = c("homepage", "facebook", "twitter", "linkedin"),
properties.url = c("www.example1.com", "www.fbex1.com", "www.twitterex1.com", "www.linkedinex1.com") ) ) ) )
x[[2]] <- list(uuid = "987",
relationships = list(websites = list(items = list(
properties.website_type = c("homepage", "facebook", "twitter" ),
properties.url = c("www.example2.com", "www.fbex2.com", "www.twitterex2.com") ) ) ) )
# --- Iterations of unnest:
x_df <- x %>% tibble::as_tibble_col( .) %>%
tidyr::unnest_wider(col = "value") %>%
tidyr::unnest_longer(col = "relationships") %>%
tidyr::unnest_wider(col = "relationships") %>%
tidyr::unnest_wider(col = "items") %>%
tidyr::unnest_longer(col = c("properties.website_type", "properties.url")) %>%
# --- Lastly, group by id:
group_by(uuid) %>%
tidyr::pivot_wider(data = .,
names_from = properties.website_type,
values_from = c("properties.url"))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Lulliter |