'Compute the size of directory in R
I want to compute the size of a directory in R. I tried to use the list.info
function, by unfortunably that follows the symbolic links so my results are biased:
# return wrong size, with duplicate counts for symlinks
sum(file.info(list.files(path = '/my/directory/', recursive = T, full.names = T))$size)
How do I compute the file size of a directory, so that it gives me the same result as on Linux, e.g. with du -s
for example?
Thanks
Solution 1:[1]
system('powershell -noprofile -command "ls -r|measure -s Length"')
References:
- https://technet.microsoft.com/en-us/library/ff730945.aspx
- Get Folder Size from Windows Command Line
- https://stat.ethz.ch/R-manual/R-devel/library/base/html/system.html
- https://superuser.com/questions/217773/how-can-i-check-the-actual-size-used-in-an-ntfs-directory-with-many-hardlinks
You can also leverage cygwin if you have it; this lets you use Linux commands and get comparable results. Further there's a nice solution using Sysinternals in the last link I gave above.
Solution 2:[2]
I finally used this:
system('du -s')
Solution 3:[3]
Healthy solution, might be very useful for checking a package size.
dir_size <- function(path, recursive = TRUE) {
stopifnot(is.character(path))
files <- list.files(path, full.names = T, recursive = recursive)
vect_size <- sapply(files, function(x) file.size(x))
size_files <- sum(vect_size)
size_files
}
cat(dir_size(find.package("Rcpp"))/10**6, "MB")
#> 14.81649 MB
Created on 2021-06-26 by the reprex package (v2.0.0)
Solution 4:[4]
"file.size" return the actual size, size on disk is the actual amount of space being taken up on the disk. check this to understand the difference . https://superuser.com/questions/66825/what-is-the-difference-between-size-and-size-on-disk try this for size of all files:
files<-list.files(path_of_directory,full.names = T)
vect_size <- sapply(files, file.size)
size_files <- sum(vect_size)
Solution 5:[5]
Recently, I have deal with this problem and here is my code:
library(pacman)
p_load(fs,tidyfst)
sys_time_print({
dir_info(your_directory_path) -> your_dir_info
})
your_dir_info %>%
summarise_dt(size = sum(size,na.rm = T))
When I first run the code above, it takes about 3min to track 52G files (in 174,731 separate files). Later when I run again, it takes shorter than 6s. This is amazing.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Community |
Solution 2 | Carmellose |
Solution 3 | |
Solution 4 | islem |
Solution 5 | Hope |