'How to list all datasets in h5py file?

I have a h5py file storing numpy arrays, but I got Object doesn't exist error when trying to open it with the dataset name I remember, so is there a way I can list what datasets the file has?

   with h5py.File('result.h5','r') as hf:
        #How can I list all dataset I have saved in hf?


Solution 1:[1]

You have to use the keys method. This will give you a List of unicode strings of your dataset and group names. For example:

Datasetnames=hf.keys()

Another gui based method would be to use HDFView. https://support.hdfgroup.org/products/java/release/download.html

Solution 2:[2]

The other answers just tell you how to make a list of the keys under the root group, which may refer to other groups or datasets.

If you want something closer to h5dump but in python, you can do something like that:

import h5py

def descend_obj(obj,sep='\t'):
    """
    Iterate through groups in a HDF5 file and prints the groups and datasets names and datasets attributes
    """
    if type(obj) in [h5py._hl.group.Group,h5py._hl.files.File]:
        for key in obj.keys():
            print sep,'-',key,':',obj[key]
            descend_obj(obj[key],sep=sep+'\t')
    elif type(obj)==h5py._hl.dataset.Dataset:
        for key in obj.attrs.keys():
            print sep+'\t','-',key,':',obj.attrs[key]

def h5dump(path,group='/'):
    """
    print HDF5 file metadata

    group: you can give a specific group, defaults to the root group
    """
    with h5py.File(path,'r') as f:
         descend_obj(f[group])

Solution 3:[3]

If you want to list the key names, you need to use the keys() method which gives you a key object, then use the list() method to list the keys:

with h5py.File('result.h5','r') as hf:
    dataset_names = list(hf.keys())

Solution 4:[4]

If you are at the command line, use h5ls -r [file] or h5dump -n [file] as recommended by others.

Within python, if you want to list below the topmost group but you don't want to write your own code to descend the tree, try the visit() function:

with h5py.File('result.h5','r') as hf:
    hf.visit(print)

Or for something more advanced (e.g. to include attributes info) use visititems:

def printall(name, obj):
    print(name, dict(obj.attrs))

with h5py.File('result.h5','r') as hf:
    hf.visititems(printall)

Solution 5:[5]

Since using the keys() function will give you only the top level keys and will also contain group names as well as datasets (as already pointed out by Seb), you should use the visit() function (as suggested by jasondet) and keep only keys that point to datasets.

This answer is kind of a merge of jasondet's and Seb's answers to a simple function that does the trick:

def get_dataset_keys(f):
    keys = []
    f.visit(lambda key : keys.append(key) if isinstance(f[key], h5py.Dataset) else None)
    return keys

Solution 6:[6]

Just for showing the name of the underlying datasets, I would simply use h5dump -n <filename>

That is without running a python script.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 max9111
Solution 2 Seb
Solution 3 Chris
Solution 4
Solution 5
Solution 6 nj2237