'Problem with the python zipfile library if you share a file between linux and windows [duplicate]

The zipfile module is very interesting to manage .zip files with python.

However if the .zip file has been created on a linux system or macos the separator is of course '/' and if we try to work with this file on a Windows system there can be a problem because the separator is '\'. So, for example, if we try to determine the directory root compressed in the .zip file we can think to something like:

from zipfile import ZipFile, is_zipfile
import os

if is_zipfile(filename):

    with ZipFile(filename, 'r') as zip_ref:
        packages_name = [member.split(os.sep)[0] for member in zip_ref.namelist()
                         if (len(member.split(os.sep)) == 2 and not
                                                       member.split(os.sep)[-1])]

But in this case, we always get packet_name = [] because os.sep is "\" whereas since the compression was done on a linux system the paths are rather 'foo1/foo2'.

In order to manage all cases (compression on a linux system and use on a Windows system or the opposite), I want to use:

from zipfile import ZipFile, is_zipfile
import os

if is_zipfile(filename):

    with ZipFile(filename, 'r') as zip_ref:

        if all([True if '/' in el else
                False for el in zip_ref.namelist()]):
            packages_name = [member.split('/')[0] for member in zip_ref.namelist()
                             if (len(member.split('/')) == 2 and not
                                                       member.split('/')[-1])]

        else:
            packages_name = [member.split('\\')[0] for member in zip_ref.namelist()
                             if (len(member.split('\\')) == 2 and not
                                                           member.split('\\')[-1])]

What do you think of this? Is there a more direct or more pythonic way to do the job?



Solution 1:[1]

Thanks to @snakecharmerb answer and to the reading of the link he proposed, I have just understood. Thank you @snakecharmerb for showing me the way ... In fact, indeed as described in the link proposed, internally zipfile uses only '/' and this independently of the OS used. As I like to see things concretely I just did this little test:

  • On a Windows OS I created with the usual means of this OS (not in command line) a file testZipWindows.zip containing this tree structure:

    • testZipWindows
      • foo1.txt
      • InFolder
        • foo2.txt
  • I did the same thing on a linux OS (and without also using a command line) for the testZipFedora.zip archive:

    • testZipFedora
      • foo1.txt
      • InFolder
        • foo2.txt

This is the result:

$ python3
Python 3.7.9 (default, Aug 19 2020, 17:05:11) 
[GCC 9.3.1 20200408 (Red Hat 9.3.1-2)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from zipfile import ZipFile
>>> with ZipFile('/home/servoz/Desktop/test/testZipWindows.zip', 'r') as WinZip:
...  WinZip.namelist()
... 
['testZipWindows/', 'testZipWindows/foo1.txt', 'testZipWindows/InFolder/', 'testZipWindows/InFolder/foo2.txt']
>>> with ZipFile('/home/servoz/Desktop/test/testZipFedora.zip', 'r') as fedZip:
...  fedZip.namelist()
... 
['testZipFedora/', 'testZipFedora/foo1.txt', 'testZipFedora/InFolder/', 'testZipFedora/InFolder/foo2.txt']

So it all lights up! We must indeed use os.path.sep to work properly in multiplatform but when we deals with zipfile library it is absolutely necessary to use '/' as separator and not os.sep (or os.path.sep). That was my mistake !!!

So the code to use in a multiplatform way for the example of my first post is just:

from zipfile import ZipFile, is_zipfile
import os

if is_zipfile(filename):

    with ZipFile(filename, 'r') as zip_ref:
        packages_name = [member.split('/')[0] for member in zip_ref.namelist()
                             if (len(member.split('/')) == 2 and not
                                                       member.split('/')[-1])]

And not all the useless things I had imagined...

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 servoz