'Python zipfile does not unzip folders for windows zip archive

I have a zip file which was created on Windows machine using this tool System.IO.Compression.ZipFile (this zip archive contains many files and folders). I have a python code that runs on Linux machine (raspberry pi to be exact) which has to unzip the archive and create all the necessary folders and files. I'm using Python 3.5.0 and zipfile library, this is a sample code:

import zipfile

zip = zipfile.ZipFile("MyArchive.zip","r")
zip.extractall()
zip.close()

Now when I run this code instead of getting a nice unzipped directory tree, I get all the files in root directory with weird names like Folder1\Folder2\MyFile.txt.

My assumption is that since zip archive was created on Windows and directory separator on windows is \ whereas on Linux it is /, python zipfile library treats \ as part of a file name instead of directory separator. Also note that when I'm extracting this archive manually (not through python code) all the folder are created as expected, so it seems that this is definitely a problem of zipfile library. Another note is that for zip archives that where created with a different tool (not System.IO.Compression.ZipFile) it works OK using the same python code.

Any insight on what's going on and how to fix it?



Solution 1:[1]

What is happening is that while Windows recognizes both \ (path.sep) and / (path.altsep) as path separators, Linux only recognizes / (path.sep).

As @blhsing's answer shows, the existing implementation of ZipFile always ensures that path.sep and / are considered valid separator characters. That means that on Linux, \ is treated as a literal part of the file name. To change that, you can set os.altsep to \, since it gets checked if it's not None of empty.

If you go down the road of modifying ZipFile itself, like the other answer suggests, just add a line to blindly change \ to path.sep, since / is always changed already anyway. That way, /, \and possibly path.altsep will all be converted to path.sep. This is what the command line tool appears to be doing.

Solution 2:[2]

This is indeed a bug of the zipfile module, where it has the following line in ZipFile._extract_member() to blindly replace '/' in the file names with the OS-specific path separator, when it should also look for '\\':

arcname = member.filename.replace('/', os.path.sep)

You can fix this by overriding ZipFile._extract_member() with a version that's directly copied from the source code but with the above line corrected:

from zipfile import ZipFile, ZipInfo
import shutil
import os
def _extract_member(self, member, targetpath, pwd):
    """Extract the ZipInfo object 'member' to a physical
       file on the path targetpath.
    """
    if not isinstance(member, ZipInfo):
        member = self.getinfo(member)

    if os.path.sep == '/':
        arcname = member.filename.replace('\\', os.path.sep)
    else:
        arcname = member.filename.replace('/', os.path.sep)

    if os.path.altsep:
        arcname = arcname.replace(os.path.altsep, os.path.sep)
    # interpret absolute pathname as relative, remove drive letter or
    # UNC path, redundant separators, "." and ".." components.
    arcname = os.path.splitdrive(arcname)[1]
    invalid_path_parts = ('', os.path.curdir, os.path.pardir)
    arcname = os.path.sep.join(x for x in arcname.split(os.path.sep)
                               if x not in invalid_path_parts)
    if os.path.sep == '\\':
        # filter illegal characters on Windows
        arcname = self._sanitize_windows_name(arcname, os.path.sep)

    targetpath = os.path.join(targetpath, arcname)
    targetpath = os.path.normpath(targetpath)

    # Create all upper directories if necessary.
    upperdirs = os.path.dirname(targetpath)
    if upperdirs and not os.path.exists(upperdirs):
        os.makedirs(upperdirs)

    if member.is_dir():
        if not os.path.isdir(targetpath):
            os.mkdir(targetpath)
        return targetpath

    with self.open(member, pwd=pwd) as source, \
            open(targetpath, "wb") as target:
        shutil.copyfileobj(source, target)

    return targetpath
ZipFile._extract_member = _extract_member

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2