'Python ZipFile does not compress as small as zip in terminal
I have a large amount of data I am writing to a zip file using the following code:
import json
from zipfile import ZipFile, ZIP_DEFLATED
with ZipFile(out_path, 'w', ZIP_DEFLATED) as z_file:
z_file.writestr(filename, json.dumps(data))
The result is a file that is 3.1 MB. When I uncompress it with unzip
, I get a new file that is 33 MB. Then, when I recompress it using zip
, I get a file that is 2.1 MB.
Why are the two compressed files different sizes?
I am using a Mac, thus I am using the Mac version of zip
Zip 3.0 (July 5th 2008), by Info-ZIP. And I am using Python 3.9.4. The man page for zip
says it uses the deflate algorithm by default, which I believe is the same algorithm I used in the Python code above.
I did see this similar question, but their issue was that the file was not compressing at all because they were using a ZipInfo object. My file does compress, but just not as small as with zip
, and I'm not using a ZipInfo object.
Update
I ran some tests to see if the difference was caused by the compresslevel
option. The man page for zip
says the default compress level is 6. So I used Python's ZipFile
and the zip
command to compress the same file, first without a compress level value (the default), then by explicitly setting it to 6, and then 9.
In my program, I am generating the data and then writing it to a file. I wanted to see if there was a difference to compressing data stored in memory versus compressing data read from a file, so I also used ZipFile
to compress the data in both of these scenarios.
I also tried using shutil
for good measure. It doesn't appear to have an option for compress level, so I just have the default result.
Here are the results.
Zip Method | Default | Level 6 | Level 9 |
---|---|---|---|
Unzipped | 33M | N/A | N/A |
ZipFile Generated Data |
3.1M | 3.1M | 2.9M |
ZipFile From File |
3.1M | 2.1M | 2.9M |
zip From File |
2.1 M | 4.2M | 2.3M |
shutil From File |
2.1M | N/A | N/A |
I find it very strange that 1) none of the results between the ZipFile
and zip
match for the same compress level, and 2) that just explicitly setting the compress level to 6 for zip
(which is supposedly the default) causes the file to be twice as large.
Any more thoughts as to what is going on?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|