'Python ZipFile does not compress as small as zip in terminal

I have a large amount of data I am writing to a zip file using the following code:

import json
from zipfile import ZipFile, ZIP_DEFLATED

with ZipFile(out_path, 'w', ZIP_DEFLATED) as z_file:
    z_file.writestr(filename, json.dumps(data))

The result is a file that is 3.1 MB. When I uncompress it with unzip, I get a new file that is 33 MB. Then, when I recompress it using zip, I get a file that is 2.1 MB.

Why are the two compressed files different sizes?

I am using a Mac, thus I am using the Mac version of zip Zip 3.0 (July 5th 2008), by Info-ZIP. And I am using Python 3.9.4. The man page for zip says it uses the deflate algorithm by default, which I believe is the same algorithm I used in the Python code above.

I did see this similar question, but their issue was that the file was not compressing at all because they were using a ZipInfo object. My file does compress, but just not as small as with zip, and I'm not using a ZipInfo object.

Update

I ran some tests to see if the difference was caused by the compresslevel option. The man page for zip says the default compress level is 6. So I used Python's ZipFile and the zip command to compress the same file, first without a compress level value (the default), then by explicitly setting it to 6, and then 9.

In my program, I am generating the data and then writing it to a file. I wanted to see if there was a difference to compressing data stored in memory versus compressing data read from a file, so I also used ZipFile to compress the data in both of these scenarios.

I also tried using shutil for good measure. It doesn't appear to have an option for compress level, so I just have the default result.

Here are the results.

Zip Method Default Level 6 Level 9
Unzipped 33M N/A N/A
ZipFile Generated Data 3.1M 3.1M 2.9M
ZipFile From File 3.1M 2.1M 2.9M
zip From File 2.1 M 4.2M 2.3M
shutil From File 2.1M N/A N/A

I find it very strange that 1) none of the results between the ZipFile and zip match for the same compress level, and 2) that just explicitly setting the compress level to 6 for zip (which is supposedly the default) causes the file to be twice as large.

Any more thoughts as to what is going on?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source