'How to add a new array to an existing npz file in a standard way?

I have a function which writes an array to a compressed *.npz file:

def save_a(file):
    np.savez_compressed(file, a=[[1, 2, 3]])

I want to make a function calling the function save_a(), then adding another array to the file (e.g. with metadata):

def save_a_b(file):
    save_a(file)
    np.savez_compressed(file, b=[len(save_a.__name__)])

Sadly, when I try to use the function save_a_b(), I have the former array overwritten:

buffer = io.BytesIO()
save_a_b(buffer)
buffer.seek(0)
with np.load(buffer) as fh:
    for name in fh:
        print(name, fh[name])

I need to implement it without use of:

  • private methods,
  • hacking the .npz format unless its specification is warranted to be backward compatible in the future.

I prefer solution applicable both to file objects and to str paths.



Solution 1:[1]

How about this?

import numpy as np

def save_a(filename):
    np.savez_compressed(filename, a=[[1, 2, 3]])

def save_a_b(filename):
    save_a(filename)
    data = np.load(filename)
    np.savez_compressed(filename, b=[len(save_a.__name__)], **data)

npzfile = '/tmp/test.npz'
save_a_b(npzfile)

print(list(np.load(npzfile).items()))

Putting it into a function:

def show_npzfile(npzfile):
    data = np.load(npzfile)
    for key, value in data.items():
        print(key, value)

def add_data(npzfile, overwrite_old=True, **new_data):
    data = dict(np.load(npzfile).items())
    if not overwrite_old:
        new_data, data = data, new_data
    data.update(new_data)
    np.savez_compressed(npzfile, **data)

print(f'content of "{filename}":')    
show_npzfile(filename)

add_data(filename, c=np.random.randint(0, 10, 10))

print(f'content of "{filename}":')    
show_npzfile(filename)

add_data(filename, c=10, d='asdf', overwrite_old=False)
print(f'content of "{filename}":')    
show_npzfile(filename)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Michael Habeck