'"Bad magic number for file header" for working concatenated zip file using zipfile in Python 3.7

I'm trying to use the Python zipfile library to unzip a split ZIP file by concatenating all file splits and then unzipping the final product, but I keep getting hit with the "Bad magic number for file header" error using this library.

I'm writing a Python script which will normally receive a single ZIP file, but will very rarely receive a ZIP file split into multiple parts (for example, foo.zip.001, foo.zip.002, etc). From what I can tell, there's no easy way to deal with this if you need to bundle the script up with its dependencies for a Docker container. However, I stumbled across this SO answer which explains that you can concatenate the files into a single ZIP file and treat it as such. So my battle plan is to concatenate all file splits into one big ZIP file and then unzip this file. I created a test case (with a Mac terminal) using a video file with the following command:

$ zip -s 5m test ch4_3.mp4

Here's my code to concatenate all files together:

import zipfile

split_files = ['test.z01', 'test.z02', 'test.z03', 'test.zip']

with open('test_video.zip', 'wb') as f:
    for file in split_files:
        with open(file, 'rb') as zf:
            f.write(zf.read())

If I go to my terminal and run unzip test_video.zip, this is the output:

$ unzip test_video.zip
Archive:  test_video.zip
warning [test_video.zip]:  zipfile claims to be last disk of a multi-part archive;
  attempting to process anyway, assuming all parts have been concatenated
  together in order.  Expect "errors" and warnings...true multi-part support
  doesn't exist yet (coming soon).
warning [test_video.zip]:  15728640 extra bytes at beginning or within zipfile
  (attempting to process anyway)
file #1:  bad zipfile offset (local header sig):  15728644
  (attempting to re-compensate)
  inflating: ch4_3.mp4

It seems like it hits a bit of a road bump, but it successfully works. However, when I try to run the following code:

if not os.path.exists('output'):
    os.mkdir('output')
with zipfile.ZipFile('tester/test_video.zip', 'r') as z:
    z.extractall('output')

I get the following error:

---------------------------------------------------------------------------
BadZipFile                                Traceback (most recent call last)
<ipython-input-60-07a6f56ea685> in <module>()
      2     os.mkdir('output')
      3 with zipfile.ZipFile('tester/test_video.zip', 'r') as z:
----> 4     z.extractall('output')

~/anaconda3/lib/python3.6/zipfile.py in extractall(self, path, members, pwd)
   1499 
   1500         for zipinfo in members:
-> 1501             self._extract_member(zipinfo, path, pwd)
   1502 
   1503     @classmethod

~/anaconda3/lib/python3.6/zipfile.py in _extract_member(self, member, targetpath, pwd)
   1552             return targetpath
   1553 
-> 1554         with self.open(member, pwd=pwd) as source,    1555              open(targetpath, "wb") as target:
   1556             shutil.copyfileobj(source, target)

~/anaconda3/lib/python3.6/zipfile.py in open(self, name, mode, pwd, force_zip64)
   1371             fheader = struct.unpack(structFileHeader, fheader)
   1372             if fheader[_FH_SIGNATURE] != stringFileHeader:
-> 1373                 raise BadZipFile("Bad magic number for file header")
   1374 
   1375             fname = zef_file.read(fheader[_FH_FILENAME_LENGTH])

BadZipFile: Bad magic number for file header

If I try to run it with the .zip file before the others, this is what I get:

split_files = ['test.zip', 'test.z01', 'test.z02', 'test.z03']

with open('test_video.zip', 'wb') as f:
    for file in split_files:
        with open(file, 'rb') as zf:
            f.write(zf.read())

with zipfile.ZipFile('test_video.zip', 'r') as z:
    z.extractall('output')

Here's the output:

---------------------------------------------------------------------------
BadZipFile                                Traceback (most recent call last)
<ipython-input-14-f7aab706dbed> in <module>()
      1 if not os.path.exists('output'):
      2     os.mkdir('output')
----> 3 with zipfile.ZipFile('test_video.zip', 'r') as z:
      4     z.extractall('output')

~/anaconda3/lib/python3.6/zipfile.py in __init__(self, file, mode, compression, allowZip64)
   1106         try:
   1107             if mode == 'r':
-> 1108                 self._RealGetContents()
   1109             elif mode in ('w', 'x'):
   1110                 # set the modified flag so central directory gets written

~/anaconda3/lib/python3.6/zipfile.py in _RealGetContents(self)
   1173             raise BadZipFile("File is not a zip file")
   1174         if not endrec:
-> 1175             raise BadZipFile("File is not a zip file")
   1176         if self.debug > 1:
   1177             print(endrec)

BadZipFile: File is not a zip file

Using the answer from this SO question, I've worked out that the header is b'PK\x07\x08' but I don't know why. I also used the testzip() function and it points straight to the culprit: ch4_3.mp4.

You can find the ZIP file in question at this link here. Any ideas on what to do?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source