'How to read data from a zipfile on a website without locally downloading zipfile
I am using the following piece of code:
import zipfile
import urllib
link = "http://www.dummypage.com/dummyfile.zip"
file_handle = urllib.urlopen(link)
zip_file_object = zipfile.ZipFile(file_handle, 'r')
I get the following error on execution. Please help.
Traceback (most recent call last):
File "fcc.py", line 34, in <module>
zip_file_object = zipfile.ZipFile(file_handle)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/zipfile.py", line 770, in __init__
self._RealGetContents()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/zipfile.py", line 807, in _RealGetContents
endrec = _EndRecData(fp)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/zipfile.py", line 208, in _EndRecData
fpin.seek(0, 2)
AttributeError: addinfourl instance has no attribute 'seek'
Solution 1:[1]
You need a streaming handler interface to handle data in memory. For text data, the most common lib used is StringIO
. To binary data, the right lib is io
.
import io
import urllib
import zipfile
link = "http://www.dummypage.com/dummyfile.zip"
file_handle = io.BytesIO(urllib.urlopen(link).read())
zip_file_object = zipfile.ZipFile(file_handle, 'r')
The point is, the download of the file is done indeed, but it will be in a temp folder. And you don't need to care about it
Solution 2:[2]
Can you use external tools? @ruario 's answer to Bash - how to unzip a piped zip file (from “wget -qO-”) is very interesting. Basically, zip stores its directory at the end of the file and zip tools tend to need the entire file to get to the directory. However, the zip also includes inline headers and some tools can use those. If you don't mind calling out to bsdtar
(or other tools), you can do this:
import urllib
import shutil
import subprocess as subp
url_handle = urllib.urlopen("test.zip")
proc = subp.Popen(['bsdtar', '-xf-'], stdin=subp.PIPE)
shutil.copyfileobj(url_handle, proc.stdin)
proc.stdin.close()
proc.wait()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Community |