'Use numpy.load on file within zipfile
I have a zipfile which contains many npy files (file1.npy
, file2.npy
, file3.npy
, ...). I would like to load them individually without extracting the zipfile on a filesystem. I have tried many things but I can't figure it out.
My guess was:
import zipfile
import numpy as np
a = {}
with zipfile.ZipFile('myfiles.zip') as zipper:
for p in zipper.namelist():
with zipper.read(p) as f:
a[p] = np.load(f)
Any ideas?
Solution 1:[1]
Save 2 arrays, each to their own file:
In [452]: np.save('x.npy',x)
In [453]: np.save('y.npy',y)
With a file browser tool, create a zip
file, and try to load it:
In [454]: np.load('xy.zip')
Out[454]: <numpy.lib.npyio.NpzFile at 0xb48968ec>
Looks like np.load
detected the zip
nature (independent of the name), and returned a NpzFile
object. Let's assign it to a variable, and try the normal .npz
extract:
In [455]: xy=np.load('xy.zip')
In [456]: xy['x']
Out[456]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In [457]: xy['y']
Out[457]:
array([[ 0, 4, 8],
[ 1, 5, 9],
[ 2, 6, 10],
[ 3, 7, 11]])
So load
can perform the lazy
load on any zip
file of npy
files, regardless of how it's created.
Solution 2:[2]
Does the numpy function expect a file object, not the resulting text. For zip files, I generally do something like:
with ZipFile(path, mode='r') as archive:
with io.BufferedReader(archive.open(filename, mode='r')) as file:
I am guessing you should pass zipper.open(p, mode='r') into np.load(f). Also, I strong urge you not to do zipper.read(p) since it will read the whole file in memory at once. So, using your code conventions, try:
with zipfile.ZipFile('myfiles.zip') as zipper:
for p in zipper.namelist():
with io.BufferedReader(zipper.open(p, mode='r')) as f:
a[p] = np.load(f)
Solution 3:[3]
I replace load with BytesIO. I do not know if it is efficient, but works and is more readable :)
with ZipFile(fname) as z:
for p in zipper.namelist():
tmp = np.load(io.BytesIO(z.read(p)))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Sigmun |
Solution 3 |