'Native pipe with purrr::map_dfr()
I'd like to use the new native pipe,|>
, with purrr::map_dfr()
. (To make it reproducible, I'm passing the datasets as strings instead of paths, but that shouldn't make a difference.)
csvs <- c(
"csv_a" = "a,b,c\n1,2,3\n4,5,6",
"csv_b" = "a,b,c\n-1,-2,-3"
)
col_types <- readr::cols(.default = readr::col_character())
# Approach 1
csvs |>
purrr::map_dfr(
.f = function(p) {
readr::read_csv(
file = I(p),
col_types = col_types
)
}
)
# Approach 2
library(magrittr)
csvs %>%
purrr::map_dfr(
.x = .,
.f = ~readr::read_csv(
file = I(.),
col_types = col_types
)
)
I have two questions, mostly to continue my understanding of the native pipe.
Question 1
How do I replace the explicit function(p)
part with the new {\(x)...}()
syntax? The attempt below throws "Error in standardise_path(file) : argument "p" is missing, with no default".
csvs |>
purrr::map_dfr(
.f =
{\(p)
readr::read_csv(
file = I(p),
col_types = col_types
)
}()
)
Question 2
Can I also mimic the magrittr approach (#2)? This somehow reads each row twice, including the header.
csvs |>
{\(p)
purrr::map_dfr(
.x = p,
.f = ~readr::read_csv(
file = I(p),
col_types = col_types
)
)
}()
# Produces
# A tibble: 8 x 3
a b c
<chr> <chr> <chr>
1 1 2 3
2 4 5 6
3 a b c
4 -1 -2 -3
5 1 2 3
6 4 5 6
7 a b c
8 -1 -2 -3
edit: In response to @MrFlick's comment, I've wrapped the argument to file
with I()
in case that becomes a requirement in future versions of readr (it seems to work fine now without it). If you're passing typical file paths (instead of literal strings), remove the call to I()
.
Solution 1:[1]
Answer for Question 1 -
csvs |>
purrr::map_dfr(
.f = \(k) {
readr::read_csv(
file = k,
col_types = col_types
)
}
)
# a b c
<chr> <chr> <chr>
#1 1 2 3
#2 4 5 6
#3 -1 -2 -3
Solution 2:[2]
Answer for Question 2: for the inner function, you use p
, which reuses csvs
on each call. So the inner function ignores the value its mapping over and instead uses the whole list. You may avoid that using the .x pronoun:
csvs |>
{\(p)
purrr::map_dfr(
.x = p,
.f = ~readr::read_csv(
file = I(.x),
col_types = col_types
)
)
}()
Stylistically, it might be nicer to avoid the formula mapper altogether, since you don't have any custom behavior in your function. The ...
in purrr::map_dfr will be passed on to the function on each call.1
csvs |>
{\(p) purrr::map_dfr(.x = p, .f = readr::read_csv, col_types = col_types)}()
Since you don't reuse the p
argument, the anonymous function is also unnecessary:
csvs |>
purrr::map_dfr(.f = readr::read_csv, col_types = col_types)
1@MrFlick is correct in that I()
should be used in principle if you're expecting strings instead of a file name, however in your case, you do not need it because there is a newline in all strings in the csvs
vector. See here for details. I take it out to illustrate your alternatives.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | wibeasley |
Solution 2 | Bob Zimmermann |