'zipfile extract zip to a folder

Below is what file structure looks like

music_folder
    album1.zip (below are contents inside of zip)
        song1.mp3
        song2.mp3
        song3.mp3
    album2.zip (below are contents inside of zip)
        song12.mp3
        song14.mp3
        song16.mp3

I want to extract both zipped albums into a directory called cache and I want the same structure. This is what I want it to look like:

cache
    album1 (this is a normal unzipped folder)
        song1.mp3
        song2.mp3
        song3.mp3
    album2 (this is a normal unzipped folder)
        song12.mp3
        song14.mp3
        song16.mp3

But for some reason, for album1 the files are extracted directly in cache directory instead of cache/album1.

This is what it looks like and I don't want this:

cache
    song1.mp3
    song2.mp3
    song3.mp3
    album2 (this is a normal unzipped folder)
        song12.mp3
        song14.mp3
        song16.mp3

Below is my code:

for zipped_album in os.listdir('music_folder'):
    zip_ref = ZipFile('music_folder/' + zipped_album, 'r')
    zip_ref.extractall('cache')
    zip_ref.close()

Any ideas why the files are not extracted in a folder inside chache for album1? It works for album2



Solution 1:[1]

Zip files can contain (relative) pathnames, not just filenames.

So, the contents of album2.zip are most likely actually:

  • album2/song1.mp3
  • album2/song2.mp3
  • album2/song3.mp3

… but album1.zip is just:

  • song1.mp3
  • song2.mp3
  • song3.mp3

To test this, you can do unzip -l album1.zip and unzip -l album2.zip from your shell.


This is actually a problem that people have been having as long as they've been sharing zipfiles. You usually want to include that album2 in the paths, but sometimes it's missing. You don't want to forcibly add it and end up with album2/album2/song1.mp3, but you don't want to not add it and end up with just song1.mp3 in the top directory.

The solution that most GUI tools use nowadays (I think it dates back to the ancient Stuffit Expander) is this:

  • Iterate all of the zip entries and see if the pathnames all start with the same directory.
  • If so, unzip them as-is.
  • If not, create a directory with the same name as the zipfile (minus the .zip), then unzip them into that directory.

The one tricky bit is that zipfile paths can be Windows or POSIX format, and they can be absolute paths or UNC paths or even paths starting with .., and the logic for transforming those to usable paths is, while not exactly difficult, more than just a one-liner. So, you have to decide how far you want to go with making your code fully general.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1