'Errno 22 Invalid argument - Zipfile Is Skipped

I am working on a project in Python in which I am parsing data from a zipped folder containing log files. The code works fine for most zips, but occasionally this exception is thrown:

[Errno 22] Invalid argument

As a result, the entire file is skipped, thus excluding the data in the desired log files from the results. When I try to extract the zipped file using the default Windows utility, I am met with this error: Zip error However, when I try to extract the file with 7zip, it does so successfully, save 2 errors:

1 <path> Unexpected End of Data
2 Data error:  x.csv

x.csv is totally unrelated to the log I am trying to parse, and as such, I need to write code that is resilient to the point where if an unrelated file is corrupted, it will still be able to parse the other logs that are not.

At the moment, I am using the zipfile module to extract the files into memory. Is there a robust way to do this without the entire file being skipped?

Update 1: I believe the error I am running into is that the zipfile is missing a footer. I realized this when looking at it in a hex editor. I do not really have any idea how to safely edit the actual file using Python. Here is the code that I am using to extract zips into memory:

    for zip in os.listdir(directory):
        try:
            if zip.lower().endswith('.zip'):
                if os.path.isfile(directory + "\\" + zip):
                    logs = zipfile.ZipFile(directory + "\\" + zip)
                    for log in logs.namelist():
                        if log.endswith('log.txt'):
                            data = logs.read(log)

Edit 2: Traceback for the error:

Traceback (most recent call last):
  File "c:/Users/xxx/Desktop/Python Porjects/PE/logParse.py", line 28, in parse
    logs = zipfile.ZipFile(directory + "\\" + zip)
  File "C:\Users\xxx\AppData\Local\Programs\Python\Python37\lib\zipfile.py", line 1222, in __init__
    self._RealGetContents()
  File "C:\Users\xxx\AppData\Local\Programs\Python\Python37\lib\zipfile.py", line 1289, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file


Solution 1:[1]

The stacktrace seems to show that it's not your code which badly manage to read the file but the Python module managing zip that is raising an error.

It looks like that python zip manager is more strict than other program (see this bug where a user report a difference between python behaviour and other program as GNOME Archive Manager).

Maybe, there is a bug report to do.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 ndclt