'How to read data from a zipfile on a website without locally downloading zipfile

I am using the following piece of code:

import zipfile
import urllib

link = "http://www.dummypage.com/dummyfile.zip"
file_handle = urllib.urlopen(link)
zip_file_object = zipfile.ZipFile(file_handle, 'r')

I get the following error on execution. Please help.

Traceback (most recent call last):
  File "fcc.py", line 34, in <module>
    zip_file_object = zipfile.ZipFile(file_handle)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/zipfile.py", line 770, in __init__
    self._RealGetContents()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/zipfile.py", line 807, in _RealGetContents
    endrec = _EndRecData(fp)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/zipfile.py", line 208, in _EndRecData
    fpin.seek(0, 2)
AttributeError: addinfourl instance has no attribute 'seek'


Solution 1:[1]

You need a streaming handler interface to handle data in memory. For text data, the most common lib used is StringIO. To binary data, the right lib is io.

import io
import urllib
import zipfile

link = "http://www.dummypage.com/dummyfile.zip"
file_handle = io.BytesIO(urllib.urlopen(link).read())
zip_file_object = zipfile.ZipFile(file_handle, 'r')

The point is, the download of the file is done indeed, but it will be in a temp folder. And you don't need to care about it

Solution 2:[2]

Can you use external tools? @ruario 's answer to Bash - how to unzip a piped zip file (from “wget -qO-”) is very interesting. Basically, zip stores its directory at the end of the file and zip tools tend to need the entire file to get to the directory. However, the zip also includes inline headers and some tools can use those. If you don't mind calling out to bsdtar (or other tools), you can do this:

import urllib
import shutil
import subprocess as subp

url_handle = urllib.urlopen("test.zip")
proc = subp.Popen(['bsdtar', '-xf-'], stdin=subp.PIPE)
shutil.copyfileobj(url_handle, proc.stdin)
proc.stdin.close()
proc.wait()

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Community