'zipfile in Python produces not quite normal ZIP files

In my project set of files are created and packed to ZIP archive to be used at Android mobile phone. Android application is opening such ZIP files for reading initial data and then store results of its work to the same ZIPs. I have no access to source code of mentioned Android App and old script that generated zip files before (actually, I do not know how old ZIP files were created). But structure of ZIP archive is known and I have written new python script to make the same files.

I was faced with the following problem: ZIP files produced by my script cannot be opened by Android App (error message about incorrect file structure arrears), but if I unpack all the contents and pack it back to new ZIP file with the same name by WinZIP, 7-Zip or "Send to -> Compressed (zipped) folder" (in Windows 7) file is normally processed on the phone (this leads me to the conclusion that the problem is not in the Android Application).

The code snippet for packing folder in ZIP was as follows

# make zip
try:
    with zipfile.ZipFile(prefix + '.zip', 'w') as zipf:
        for root, dirs, files in os.walk(prefix):
            for file in files:
                zipf.write(os.path.join(root, file))
    # remove dir, that was packed
    shutil.rmtree(prefix)
    # Report about resulting
    print('File ' + prefix + '.zip was created')
except:
    print('Unexpected error occurred while creating file ' + prefix + '.zip')

After I noticed that files are not compressed I added compression option:

 zipfile.ZipFile(prefix + '.zip', 'w', zipfile.ZIP_DEFLATED) 

but this didn’t solve my problem and setting True value for allowZip64 also didn’t change the situation.

By the way a ZIP file produced with zipfile.ZIP_DEFLATED is about 5 kilobytes smaller than ZIP file produced by Windows and about 14 kilobytes smaller than 7-Zip’s result for the same archive content. At the same time all these ZIP files I can open for visual comparison by both 7-Zip and Windows Explorer.

So I have three related questions:

1) What may cause such strange behavior of my script with zipfile?

2) How else can I influence on zipfile?

3) How to check ZIP file created with zipfile to find possible structure problems or make sure there are no problems?

Of course, if I have to give up using zipfile I can use external archiver (e.g. 7-zip) for files packing, but I would like to find an elegant solution if it exists.

UPDATE:

In order to check content of ZIP file created with zipfile I made the following

# make zip
flist = []
try:
    with zipfile.ZipFile(prefix + '.zip', 'w', zipfile.ZIP_DEFLATED) as zipf:
        for root, dirs, files in os.walk(prefix):
            for file in files:
                zipf.write(os.path.join(root, file))
                # Store item in the list
                flist.append(os.path.join(root, file).replace("\\","/"))
    # remove dir, that was packed
    shutil.rmtree(prefix)
    # Report about resulting
    print('File ' + prefix + '.zip was created')
except:
    print('Unexpected error occurred while creating file ' + prefix + '.zip')
# Check of zip
with closing(zipfile.ZipFile(prefix + '.zip')) as zfile:
    for info in zfile.infolist():
        print(info.filename + \
              '  (extra = ' + str(info.extra) + \
              '; compress_type = ' + ('ZIP_DEFLATED' if info.compress_type == zipfile.ZIP_DEFLATED else 'NOT ZIP_DEFLATED')  + \
              ')')
        # remove item from list
        if info.filename in flist:
            flist.remove(info.filename)
        else:
            print(info.filename + ' is unexpected item')
print('Number of items that were missed:')
print(len(flist))

And see the following results in the output:

File en_US_00001.zip was created
en_US_00001/en_US_00001_0001/en_US_00001_0001_big.png  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0001/en_US_00001_0001_info.xml  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0001/en_US_00001_0001_small.png  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0001/en_US_00001_0001_source.pkl  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0001/en_US_00001_0001_source.tex  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0001/en_US_00001_0001_user.png  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0002/en_US_00001_0002_big.png  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0002/en_US_00001_0002_info.xml  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0002/en_US_00001_0002_small.png  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0002/en_US_00001_0002_source.pkl  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0002/en_US_00001_0002_source.tex  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0002/en_US_00001_0002_user.png  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0003/en_US_00001_0003_big.png  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0003/en_US_00001_0003_info.xml  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0003/en_US_00001_0003_small.png  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0003/en_US_00001_0003_source.pkl  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0003/en_US_00001_0003_source.tex  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0003/en_US_00001_0003_user.png  (extra = b''; compress_type = ZIP_DEFLATED)
Number of items that were missed:
0

Thus, all that was written, then was read, but the question remains - if all that is necessary has been written? E.g. in comments Harold said about relative paths... perhaps, it is the key to the answer

UPDATE 2

When I replaced zipfile by using external 7-Zip code

# make zip
subprocess.call(["7z.exe","a",prefix + ".zip", prefix])
shutil.rmtree(prefix)
# Check of zip
with closing(zipfile.ZipFile(prefix + '.zip')) as zfile:
    for info in zfile.infolist():
        print(info.filename)
        print('  (extra = ' + str(info.extra) + '; compress_type = ' + str(info.compress_type) + ')')
print('Values for compress_type:')
print(str(zipfile.ZIP_DEFLATED) + ' = ZIP_DEFLATED')
print(str(zipfile.ZIP_STORED) + ' = ZIP_STORED')

produces the following result

Creating archive en_US_00001.zip

Compressing  en_US_00001\en_US_00001_0001\en_US_00001_0001_big.png
Compressing  en_US_00001\en_US_00001_0001\en_US_00001_0001_info.xml
Compressing  en_US_00001\en_US_00001_0001\en_US_00001_0001_small.png
Compressing  en_US_00001\en_US_00001_0001\en_US_00001_0001_source.pkl
Compressing  en_US_00001\en_US_00001_0001\en_US_00001_0001_source.tex
Compressing  en_US_00001\en_US_00001_0001\en_US_00001_0001_user.png
Compressing  en_US_00001\en_US_00001_0002\en_US_00001_0002_big.png
Compressing  en_US_00001\en_US_00001_0002\en_US_00001_0002_info.xml
Compressing  en_US_00001\en_US_00001_0002\en_US_00001_0002_small.png
Compressing  en_US_00001\en_US_00001_0002\en_US_00001_0002_source.pkl
Compressing  en_US_00001\en_US_00001_0002\en_US_00001_0002_source.tex
Compressing  en_US_00001\en_US_00001_0002\en_US_00001_0002_user.png
Compressing  en_US_00001\en_US_00001_0003\en_US_00001_0003_big.png
Compressing  en_US_00001\en_US_00001_0003\en_US_00001_0003_info.xml
Compressing  en_US_00001\en_US_00001_0003\en_US_00001_0003_small.png
Compressing  en_US_00001\en_US_00001_0003\en_US_00001_0003_source.pkl
Compressing  en_US_00001\en_US_00001_0003\en_US_00001_0003_source.tex
Compressing  en_US_00001\en_US_00001_0003\en_US_00001_0003_user.png

Everything is Ok

en_US_00001/
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00Faf\xd2Y\xf9\xd1\x01Faf\xd2Y\xf9\xd1\x01%\xc9c\xd2Y\xf9\xd1\x01'; compress_type = 0)
en_US_00001/en_US_00001_0001/
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00\xbe(e\xd2Y\xf9\xd1\x01\xbe(e\xd2Y\xf9\xd1\x016\xf0c\xd2Y\xf9\xd1\x01'; compress_type = 0)
en_US_00001/en_US_00001_0001/en_US_00001_0001_big.png
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00G\x17d\xd2Y\xf9\xd1\x01G\x17d\xd2Y\xf9\xd1\x01G\x17d\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0001/en_US_00001_0001_info.xml
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00X>d\xd2Y\xf9\xd1\x01X>d\xd2Y\xf9\xd1\x01X>d\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0001/en_US_00001_0001_small.png
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00z\x8cd\xd2Y\xf9\xd1\x01ied\xd2Y\xf9\xd1\x01ied\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0001/en_US_00001_0001_source.pkl
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00\x8b\xb3d\xd2Y\xf9\xd1\x01\x8b\xb3d\xd2Y\xf9\xd1\x01\x8b\xb3d\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0001/en_US_00001_0001_source.tex
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00\xad\x01e\xd2Y\xf9\xd1\x01\xad\x01e\xd2Y\xf9\xd1\x01\xad\x01e\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0001/en_US_00001_0001_user.png
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00\xbe(e\xd2Y\xf9\xd1\x01\xbe(e\xd2Y\xf9\xd1\x01\xbe(e\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0002/
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x005:f\xd2Y\xf9\xd1\x015:f\xd2Y\xf9\xd1\x01\xcfOe\xd2Y\xf9\xd1\x01'; compress_type = 0)
en_US_00001/en_US_00001_0002/en_US_00001_0002_big.png
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00\xe0ve\xd2Y\xf9\xd1\x01\xcfOe\xd2Y\xf9\xd1\x01\xcfOe\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0002/en_US_00001_0002_info.xml
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00\xf1\x9de\xd2Y\xf9\xd1\x01\xe0ve\xd2Y\xf9\xd1\x01\xe0ve\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0002/en_US_00001_0002_small.png
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00\x02\xc5e\xd2Y\xf9\xd1\x01\x02\xc5e\xd2Y\xf9\xd1\x01\x02\xc5e\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0002/en_US_00001_0002_source.pkl
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00\x13\xece\xd2Y\xf9\xd1\x01\x13\xece\xd2Y\xf9\xd1\x01\x13\xece\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0002/en_US_00001_0002_source.tex
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00$\x13f\xd2Y\xf9\xd1\x01$\x13f\xd2Y\xf9\xd1\x01$\x13f\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0002/en_US_00001_0002_user.png
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x005:f\xd2Y\xf9\xd1\x015:f\xd2Y\xf9\xd1\x015:f\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0003/
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00\xdf\xc0g\xd2Y\xf9\xd1\x01\xdf\xc0g\xd2Y\xf9\xd1\x01Faf\xd2Y\xf9\xd1\x01'; compress_type = 0)
en_US_00001/en_US_00001_0003/en_US_00001_0003_big.png
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00W\x88f\xd2Y\xf9\xd1\x01W\x88f\xd2Y\xf9\xd1\x01W\x88f\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0003/en_US_00001_0003_info.xml
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00h\xaff\xd2Y\xf9\xd1\x01h\xaff\xd2Y\xf9\xd1\x01h\xaff\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0003/en_US_00001_0003_small.png
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00\x9b$g\xd2Y\xf9\xd1\x01y\xd6f\xd2Y\xf9\xd1\x01y\xd6f\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0003/en_US_00001_0003_source.pkl
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00\xacKg\xd2Y\xf9\xd1\x01\xacKg\xd2Y\xf9\xd1\x01\xacKg\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0003/en_US_00001_0003_source.tex
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00\xce\x99g\xd2Y\xf9\xd1\x01\xce\x99g\xd2Y\xf9\xd1\x01\xce\x99g\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0003/en_US_00001_0003_user.png
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00\xdf\xc0g\xd2Y\xf9\xd1\x01\xdf\xc0g\xd2Y\xf9\xd1\x01\xdf\xc0g\xd2Y\xf9\xd1\x01'; compress_type = 8)

Values for compress_type:
8 = ZIP_DEFLATED
0 = ZIP_STORED

As I understand the most important findings are:

  • items with info for folders (e.g. en_US_00001/, en_US_00001/en_US_00001_0001/), that were not in the ZIP produced with my usage of zipfile
  • folders have compress_type == ZIP_STORED, while for files compress_type == ZIP_DEFLATED
  • extras have different values (quite long strings were generated)


Solution 1:[1]

Based on the differences listed in UPDATE 2 of Question and examples from other question about zipfile, I have tried the following code to add directories to ZIP file and check the result:

# make zip
try:
    with zipfile.ZipFile(prefix + '.zip', 'w', zipfile.ZIP_DEFLATED) as zipf:
        info = zipfile.ZipInfo(prefix+'\\')
        zipf.writestr(info, '')
        for root, dirs, files in os.walk(prefix):
            for d in dirs:
                info = zipfile.ZipInfo(os.path.join(root, d)+'\\')
                zipf.writestr(info, '')
            for file in files:
                zipf.write(os.path.join(root, file))
    # remove dir, that was packed
    shutil.rmtree(prefix)
    # Report about resulting
    print('File ' + prefix + '.zip was created')
except:
    print('Unexpected error occurred while creating file ' + prefix + '.zip')
# Check zip content
with closing(zipfile.ZipFile(prefix + '.zip')) as zfile:
    for info in zfile.infolist():
        print(info.filename)
        print('  (extra = ' + str(info.extra) + '; compress_type = ' + str(info.compress_type) + ')')
print('Values for compress_type:')
print(str(zipfile.ZIP_DEFLATED) + ' = ZIP_DEFLATED')
print(str(zipfile.ZIP_STORED) + ' = ZIP_STORED')

Output is

File en_US_00001.zip was created
en_US_00001/
    (extra = b''; compress_type = 0)
en_US_00001/en_US_00001_0001/
    (extra = b''; compress_type = 0)
en_US_00001/en_US_00001_0002/
    (extra = b''; compress_type = 0)
en_US_00001/en_US_00001_0003/
    (extra = b''; compress_type = 0)
en_US_00001/en_US_00001_0001/en_US_00001_0001_big.png
    (extra = b''; compress_type = 8)
en_US_00001/en_US_00001_0001/en_US_00001_0001_info.xml
    (extra = b''; compress_type = 8)
en_US_00001/en_US_00001_0001/en_US_00001_0001_small.png
    (extra = b''; compress_type = 8)
en_US_00001/en_US_00001_0001/en_US_00001_0001_source.pkl
    (extra = b''; compress_type = 8)
en_US_00001/en_US_00001_0001/en_US_00001_0001_source.tex
    (extra = b''; compress_type = 8)
en_US_00001/en_US_00001_0001/en_US_00001_0001_user.png
    (extra = b''; compress_type = 8)
en_US_00001/en_US_00001_0002/en_US_00001_0002_big.png
    (extra = b''; compress_type = 8)
en_US_00001/en_US_00001_0002/en_US_00001_0002_info.xml
    (extra = b''; compress_type = 8)
en_US_00001/en_US_00001_0002/en_US_00001_0002_small.png
    (extra = b''; compress_type = 8)
en_US_00001/en_US_00001_0002/en_US_00001_0002_source.pkl
    (extra = b''; compress_type = 8)
en_US_00001/en_US_00001_0002/en_US_00001_0002_source.tex
    (extra = b''; compress_type = 8)
en_US_00001/en_US_00001_0002/en_US_00001_0002_user.png
    (extra = b''; compress_type = 8)
en_US_00001/en_US_00001_0003/en_US_00001_0003_big.png
    (extra = b''; compress_type = 8)
en_US_00001/en_US_00001_0003/en_US_00001_0003_info.xml
    (extra = b''; compress_type = 8)
en_US_00001/en_US_00001_0003/en_US_00001_0003_small.png
    (extra = b''; compress_type = 8)
en_US_00001/en_US_00001_0003/en_US_00001_0003_source.pkl
    (extra = b''; compress_type = 8)
en_US_00001/en_US_00001_0003/en_US_00001_0003_source.tex
    (extra = b''; compress_type = 8)
en_US_00001/en_US_00001_0003/en_US_00001_0003_user.png
    (extra = b''; compress_type = 8)
Values for compress_type:
8 = ZIP_DEFLATED
0 = ZIP_STORED

Adding slash to directory names (+'\\' or +'/') appeared mandatory.

And the most important thing - now ZIP file is properly accepted by Android Application.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Community