'Use numpy.load on file within zipfile

I have a zipfile which contains many npy files (file1.npy, file2.npy, file3.npy, ...). I would like to load them individually without extracting the zipfile on a filesystem. I have tried many things but I can't figure it out.

My guess was:

import zipfile
import numpy as np

a = {}

with zipfile.ZipFile('myfiles.zip') as zipper:
    for p in zipper.namelist():
        with zipper.read(p) as f:
            a[p] = np.load(f)

Any ideas?



Solution 1:[1]

Save 2 arrays, each to their own file:

In [452]: np.save('x.npy',x)
In [453]: np.save('y.npy',y)

With a file browser tool, create a zip file, and try to load it:

In [454]: np.load('xy.zip')
Out[454]: <numpy.lib.npyio.NpzFile at 0xb48968ec>

Looks like np.load detected the zip nature (independent of the name), and returned a NpzFile object. Let's assign it to a variable, and try the normal .npz extract:

In [455]: xy=np.load('xy.zip')

In [456]: xy['x']
Out[456]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [457]: xy['y']
Out[457]: 
array([[ 0,  4,  8],
       [ 1,  5,  9],
       [ 2,  6, 10],
       [ 3,  7, 11]])

So load can perform the lazy load on any zip file of npy files, regardless of how it's created.

Solution 2:[2]

Does the numpy function expect a file object, not the resulting text. For zip files, I generally do something like:

with ZipFile(path, mode='r') as archive:
    with io.BufferedReader(archive.open(filename, mode='r')) as file:

I am guessing you should pass zipper.open(p, mode='r') into np.load(f). Also, I strong urge you not to do zipper.read(p) since it will read the whole file in memory at once. So, using your code conventions, try:

with zipfile.ZipFile('myfiles.zip') as zipper:
    for p in zipper.namelist():
        with io.BufferedReader(zipper.open(p, mode='r')) as f:
            a[p] = np.load(f)

Solution 3:[3]

I replace load with BytesIO. I do not know if it is efficient, but works and is more readable :)

with ZipFile(fname) as z:
    for p in zipper.namelist():
        tmp =  np.load(io.BytesIO(z.read(p)))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Sigmun
Solution 3