'R: unzipping large compressed .csv yields "zip file is corrupt" warning

I am downloading a 78MB zip file from the UN FAO, which contains a 2.66GB csv. I am able to unzip the the downloaded file from a folder using winzip, but have been unable to unzip the file using unzip() in R:

Warning - 78MB download!

url <- "http://fenixservices.fao.org/faostat/static/bulkdownloads/FoodBalanceSheets_E_All_Data_(Normalized).zip"
path <- file.path(getwd(),"/zipped_data.zip")
download.file(url, path, mode = "wb")
unzipped_data <- unzip(path)

This results in a warning and a failure to unzip the file:

Warning message

In unzip(path) : zip file is corrupt

In the ?unzip documentation I see

"It does have some support for bzip2 compression and > 2GB zip files (but not >= 4GB files pre-compression contained in a zip file: like many builds of unzip it may truncate these, in R's case with a warning if possible)"

This makes me believe that unzip() should handle my file, but this same process has successfully downloaded, unzipped, and read multiple other smaller tables from the FAOstat. Is there a chance that the size of my csv is the source of this error? If so, what is the workaround?



Solution 1:[1]

I can't test my solution and it also depends on your installation but hopefully that'll work or at least point you to a suitable solution:

You can run winzip through command line, this page shows the structure of the call

And you can also run command lines from R, with system or shell (which is just a wrapper for system

The command line general structure to extract would be:

winzip32 -e [options] filename[.zip] folder

So we create a string with this structure and your input paths, and we create a function around it that mimics unzip with parameters zipfile and exdir

unzip_wz <- function(zipfile,exdir){
  dir.create(exdir,recursive = FALSE,showWarnings=FALSE) # I don't know how/if unzip creates folders, you might want to tweak or remove this line altogether
  str1 <- sprintf("winzip32 -e '%s' '%s'",zipfile,exdir)
  shell(str1,wait = TRUE)  # set to FALSE if you want the program to keep running while unzipping, proceed with caution but in some cases that could be an improvement of your current solution
}

You can try to use this function in place of unzip. It assumes that winzip32 was added to your system path variables, if it isn't, either add it, or replace it by the exec full name so you have something like:

str1 <- sprintf("'C://probably/somewhere/in/program/files/winzip32.exe' -e '%s' '%s'",zipfile,exdir)

PS: use full paths! the command line doesn't know your working directories (we could implement the feature in our function if needed).

Solution 2:[2]

I had the same problem running unzip() on Ubuntu Server 20.04. Setting argument unzip(..., unzip = "/usr/bin/unzip"), instead of unzip = "internal", did the trick.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Matifou