'Apply the same function over all the elements in a folder
I have a folder in the computer containing 184.000 different .RData, all are small dataframes representing an investor transactions in a specific asset. the dataframes represents the combinations between 4.000 investors and 6.000 assets. I have holes in the data, in practice I want to use the complete() function to complete each dataframe by adding the missing rows based on the column datetime.
I want R to apply the complete() function on all the elements of the folder of my pc, but I have no idea how.
I came up with the basic idea which are the following lines of code but I don't know how to tell R to apply it to the entire folder.
path_to_read \<- "dev/test-data/investors-rdata-assetbased/" # path to single .RData
path_to_save \<- "dev/test-data/investors-completedatetime/"
file_names \<- list.files(path_to_read, ".RData")
df$datetime \<- as.Date(df$datetime, format = "%Y-%m-%d")
df \<- complete(datetime = seq(min(datetime), max(datetime), by = "1 day"), fill = list(number = 0))
Solution 1:[1]
You can use the map()
function from purrr
R package.
Note:
I'm assuming that each RData contains a data.frame
called df with at least two columns called datetime and number.
First, define you "complete date" function as follows:
library(tidyverse)
complete_date <- function(df) {
min_date <- min(df$datetime)
max_date <- max(df$datetime)
datetime = seq(min_date, max_date, by = "1 day")
table <- tibble(datetime = datetime, number = 0)
table %>% inner_join(df, by = "datetime")
}
We gonna apply this function over each RData using map:
file_names %>%
map(function(file_name) {
load(file_name) # Load RData first
complete_date(df) # Apply the function
})
This will create a list of all the complete data.frames, which you can use to write as RData with save()
.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |