'Python zipfile does not unzip folders for windows zip archive
I have a zip file which was created on Windows
machine using this tool System.IO.Compression.ZipFile
(this zip archive contains many files and folders). I have a python code that runs on Linux
machine (raspberry pi to be exact) which has to unzip the archive and create all the necessary folders and files. I'm using Python 3.5.0
and zipfile
library, this is a sample code:
import zipfile
zip = zipfile.ZipFile("MyArchive.zip","r")
zip.extractall()
zip.close()
Now when I run this code instead of getting a nice unzipped directory tree, I get all the files in root directory with weird names like Folder1\Folder2\MyFile.txt
.
My assumption is that since zip archive was created on Windows and directory separator on windows is \
whereas on Linux it is /
, python zipfile
library treats \
as part of a file name instead of directory separator. Also note that when I'm extracting this archive manually (not through python code) all the folder are created as expected, so it seems that this is definitely a problem of zipfile
library. Another note is that for zip archives that where created with a different tool (not System.IO.Compression.ZipFile
) it works OK using the same python code.
Any insight on what's going on and how to fix it?
Solution 1:[1]
What is happening is that while Windows recognizes both \
(path.sep
) and /
(path.altsep
) as path separators, Linux only recognizes /
(path.sep
).
As @blhsing's answer shows, the existing implementation of ZipFile
always ensures that path.sep
and /
are considered valid separator characters. That means that on Linux, \
is treated as a literal part of the file name. To change that, you can set os.altsep
to \
, since it gets checked if it's not None of empty.
If you go down the road of modifying ZipFile
itself, like the other answer suggests, just add a line to blindly change \
to path.sep
, since /
is always changed already anyway. That way, /
, \
and possibly path.altsep
will all be converted to path.sep
. This is what the command line tool appears to be doing.
Solution 2:[2]
This is indeed a bug of the zipfile
module, where it has the following line in ZipFile._extract_member()
to blindly replace '/'
in the file names with the OS-specific path separator, when it should also look for '\\'
:
arcname = member.filename.replace('/', os.path.sep)
You can fix this by overriding ZipFile._extract_member()
with a version that's directly copied from the source code but with the above line corrected:
from zipfile import ZipFile, ZipInfo
import shutil
import os
def _extract_member(self, member, targetpath, pwd):
"""Extract the ZipInfo object 'member' to a physical
file on the path targetpath.
"""
if not isinstance(member, ZipInfo):
member = self.getinfo(member)
if os.path.sep == '/':
arcname = member.filename.replace('\\', os.path.sep)
else:
arcname = member.filename.replace('/', os.path.sep)
if os.path.altsep:
arcname = arcname.replace(os.path.altsep, os.path.sep)
# interpret absolute pathname as relative, remove drive letter or
# UNC path, redundant separators, "." and ".." components.
arcname = os.path.splitdrive(arcname)[1]
invalid_path_parts = ('', os.path.curdir, os.path.pardir)
arcname = os.path.sep.join(x for x in arcname.split(os.path.sep)
if x not in invalid_path_parts)
if os.path.sep == '\\':
# filter illegal characters on Windows
arcname = self._sanitize_windows_name(arcname, os.path.sep)
targetpath = os.path.join(targetpath, arcname)
targetpath = os.path.normpath(targetpath)
# Create all upper directories if necessary.
upperdirs = os.path.dirname(targetpath)
if upperdirs and not os.path.exists(upperdirs):
os.makedirs(upperdirs)
if member.is_dir():
if not os.path.isdir(targetpath):
os.mkdir(targetpath)
return targetpath
with self.open(member, pwd=pwd) as source, \
open(targetpath, "wb") as target:
shutil.copyfileobj(source, target)
return targetpath
ZipFile._extract_member = _extract_member
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 |