'Extract file that contains specific string on filename from ZIP using Python zipfile

I have a ZIP file and I need to extract all the files (normally one) that contain the string "test" in the filename. They are all xlsx files.

I am using Python zipfile for that. This is my code that doesn't work:

zip.extract(r'*\test.*\.xlsx$', './')

The error I get:

KeyError: "There is no item named '*\\\\test.*\\\\.xlsx$' in the archive"

Any ideas?



Solution 1:[1]

You have multiple problems here:

  • r simply means treat the string as a raw string, it looks like you might think it creates a regular expression object; (in any case, zip.extract() only accepts strings)
  • The * quantifier at the start of the regex has no character before it to match

You need to manually iterate through the zip file index and match the filenames against your regex:

from zipfile import ZipFile
import re
zip = ZipFile('myzipfile.zip')
for info in zip.infolist():
   if re.match(r'.*test.*\.xlsx$', info.filename):
       print info.filename
       zip.extract(info)

You might also consider using shell file globbing syntax: fnmatchcase(info.filename, '*.test.*.xls') (behind the scenes it converts it to a regex but it makes your code slightly simpler)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Levi