'"Bad magic number for file header" for working concatenated zip file using zipfile in Python 3.7
I'm trying to use the Python zipfile library to unzip a split ZIP file by concatenating all file splits and then unzipping the final product, but I keep getting hit with the "Bad magic number for file header" error using this library.
I'm writing a Python script which will normally receive a single ZIP file, but will very rarely receive a ZIP file split into multiple parts (for example, foo.zip.001, foo.zip.002, etc). From what I can tell, there's no easy way to deal with this if you need to bundle the script up with its dependencies for a Docker container. However, I stumbled across this SO answer which explains that you can concatenate the files into a single ZIP file and treat it as such. So my battle plan is to concatenate all file splits into one big ZIP file and then unzip this file. I created a test case (with a Mac terminal) using a video file with the following command:
$ zip -s 5m test ch4_3.mp4
Here's my code to concatenate all files together:
import zipfile
split_files = ['test.z01', 'test.z02', 'test.z03', 'test.zip']
with open('test_video.zip', 'wb') as f:
for file in split_files:
with open(file, 'rb') as zf:
f.write(zf.read())
If I go to my terminal and run unzip test_video.zip
, this is the output:
$ unzip test_video.zip
Archive: test_video.zip
warning [test_video.zip]: zipfile claims to be last disk of a multi-part archive;
attempting to process anyway, assuming all parts have been concatenated
together in order. Expect "errors" and warnings...true multi-part support
doesn't exist yet (coming soon).
warning [test_video.zip]: 15728640 extra bytes at beginning or within zipfile
(attempting to process anyway)
file #1: bad zipfile offset (local header sig): 15728644
(attempting to re-compensate)
inflating: ch4_3.mp4
It seems like it hits a bit of a road bump, but it successfully works. However, when I try to run the following code:
if not os.path.exists('output'):
os.mkdir('output')
with zipfile.ZipFile('tester/test_video.zip', 'r') as z:
z.extractall('output')
I get the following error:
---------------------------------------------------------------------------
BadZipFile Traceback (most recent call last)
<ipython-input-60-07a6f56ea685> in <module>()
2 os.mkdir('output')
3 with zipfile.ZipFile('tester/test_video.zip', 'r') as z:
----> 4 z.extractall('output')
~/anaconda3/lib/python3.6/zipfile.py in extractall(self, path, members, pwd)
1499
1500 for zipinfo in members:
-> 1501 self._extract_member(zipinfo, path, pwd)
1502
1503 @classmethod
~/anaconda3/lib/python3.6/zipfile.py in _extract_member(self, member, targetpath, pwd)
1552 return targetpath
1553
-> 1554 with self.open(member, pwd=pwd) as source, 1555 open(targetpath, "wb") as target:
1556 shutil.copyfileobj(source, target)
~/anaconda3/lib/python3.6/zipfile.py in open(self, name, mode, pwd, force_zip64)
1371 fheader = struct.unpack(structFileHeader, fheader)
1372 if fheader[_FH_SIGNATURE] != stringFileHeader:
-> 1373 raise BadZipFile("Bad magic number for file header")
1374
1375 fname = zef_file.read(fheader[_FH_FILENAME_LENGTH])
BadZipFile: Bad magic number for file header
If I try to run it with the .zip file before the others, this is what I get:
split_files = ['test.zip', 'test.z01', 'test.z02', 'test.z03']
with open('test_video.zip', 'wb') as f:
for file in split_files:
with open(file, 'rb') as zf:
f.write(zf.read())
with zipfile.ZipFile('test_video.zip', 'r') as z:
z.extractall('output')
Here's the output:
---------------------------------------------------------------------------
BadZipFile Traceback (most recent call last)
<ipython-input-14-f7aab706dbed> in <module>()
1 if not os.path.exists('output'):
2 os.mkdir('output')
----> 3 with zipfile.ZipFile('test_video.zip', 'r') as z:
4 z.extractall('output')
~/anaconda3/lib/python3.6/zipfile.py in __init__(self, file, mode, compression, allowZip64)
1106 try:
1107 if mode == 'r':
-> 1108 self._RealGetContents()
1109 elif mode in ('w', 'x'):
1110 # set the modified flag so central directory gets written
~/anaconda3/lib/python3.6/zipfile.py in _RealGetContents(self)
1173 raise BadZipFile("File is not a zip file")
1174 if not endrec:
-> 1175 raise BadZipFile("File is not a zip file")
1176 if self.debug > 1:
1177 print(endrec)
BadZipFile: File is not a zip file
Using the answer from this SO question, I've worked out that the header is b'PK\x07\x08'
but I don't know why. I also used the testzip()
function and it points straight to the culprit: ch4_3.mp4
.
You can find the ZIP file in question at this link here. Any ideas on what to do?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|