'Apply the same function over all the elements in a folder

I have a folder in the computer containing 184.000 different .RData, all are small dataframes representing an investor transactions in a specific asset. the dataframes represents the combinations between 4.000 investors and 6.000 assets. I have holes in the data, in practice I want to use the complete() function to complete each dataframe by adding the missing rows based on the column datetime.

I want R to apply the complete() function on all the elements of the folder of my pc, but I have no idea how.
I came up with the basic idea which are the following lines of code but I don't know how to tell R to apply it to the entire folder.

path_to_read \<- "dev/test-data/investors-rdata-assetbased/" # path to single .RData

path_to_save \<- "dev/test-data/investors-completedatetime/"

file_names \<- list.files(path_to_read, ".RData")

df$datetime \<- as.Date(df$datetime, format =  "%Y-%m-%d")
df \<-   complete(datetime = seq(min(datetime), max(datetime), by = "1 day"), fill = list(number = 0))


Solution 1:[1]

You can use the map() function from purrr R package.

Note: I'm assuming that each RData contains a data.frame called df with at least two columns called datetime and number.

First, define you "complete date" function as follows:

library(tidyverse)

complete_date <- function(df) {
  min_date <- min(df$datetime)
  max_date <- max(df$datetime)
  datetime = seq(min_date, max_date, by = "1 day")
  
  table <- tibble(datetime = datetime, number = 0)
  
  table %>% inner_join(df, by = "datetime")
}

We gonna apply this function over each RData using map:

file_names %>% 
  map(function(file_name) {
    load(file_name) # Load RData first
    complete_date(df) # Apply the function
  })

This will create a list of all the complete data.frames, which you can use to write as RData with save().

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1