'Importing multiple Excel files with multiple sheets and creating a new column based on sheet's names
I am trying to upload multiple Excels files (same format) and each of these files has multiple sheets, in nature each Excel file has information about a hotel bookings, and each sheet is information for one month (f.i jan, feb, oct) and each file is the information of a different hotel name (f.i hilton, paris, Vac_in) to illustrate let's just say that we have 2 files in working directory, first file represents bookings for the Hilton Hotel and the second file is Paris Hotel, file one (Hilton) has 3 sheets named "jan","feb" and "mar" and so does file 2 (Paris Hotel)
Each sheet (regarding of the file is in.. always has the same column names) so by the end of this process I want all data to be imported into one full dataframe
and I trying to add two new columns one that will have the name of the excel file and the other the name of the sheet it took the information from something like this:
ID | COST | "FILE NAME" | "SHEET NAME" |
---|---|---|---|
A | 5 | Hilton | Jan |
X | 8 | Hilton | Jan |
B | 1 | Hilton | Feb |
C | 2 | Hilton | Mar |
R | 59 | Paris | Jan |
G | 22 | Paris | Feb |
U | 10 | Paris | Mar |
W | 78 | Paris | Mar |
I will have the above desired result because the first file had the name "Hilton" and had 2 rows of info in sheet one named "Jan", and one row of info in sheet "Feb" in the same way the second file had name "Paris" and had sheets named Jan, Feb and took all rows from them and put it into the full dataframe
I can successfully upload all the information from every file and its sheets using this code:
library(tidyverse)
library(fs)
library(readxl)
path <- fs::dir_ls(choose.dir())
read_all_files_&_sheets <- function(path) {
path %>%
excel_sheets() %>%
set_names() %>%
map_df(read_excel, path = path)
}
data <- path%>%
map(read_all_files_)
full_db<-bind_rows(data)
But I can't find a way to mutate and create the new columns "FILE NAME" and "SHEET NAME" I had tried for hours but I only get errors, I will be so thankful if you can kindly help me out or reference a blog or a book where I can learn how to do it, thank you so much
Solution 1:[1]
How about changing your function slightly, like this:
read_all_files_<- function(path) {
path %>%
excel_sheets() %>%
set_names() %>%
map_df(~read_excel(.x, path = path) %>% mutate(file=path, sheet=.x))
}
Explanation:
Let's say the first file in your vector of paths (i.e. path[1]) is "<folder_structure>hilton.xlsx"
. Then, when you pass this string to read_all_files_()
, the result of piping through set_names()
is a named list of the sheets
Jan Feb Mar
"Jan" "Feb" "Mar"
You then pipe these three sheet names to map_df()
. While map_df(read_excel, path=path)
will return the data for you, you need to adjust the function to additionally mutate the frame that map_df()
is going to return. Therefore instead of .f=read_excel
, we change this to .f=~read_excel(.x, path=path) %>% mutate(....)
, where the mutate
part is adding those two columns, file
and sheet
, assigning the value of the path
and the sheet, which is .x
.
Hope this helps.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |