'Read h5 file using AWS boto3

I am trying to read h5 file from AWS S3 using boto3.

client = boto3.client('s3',key ='key')
result = client.get_object(Bucket='bucket', Key='file')
with h5py.File(result['Body'], 'r') as f:
    data = f

TypeError: expected str, bytes or os.PathLike object, not StreamingBody

Any idea?

h5py version is 2.10, boto3 version is 1.7.58

The same question was here, but no answer...



Solution 1:[1]

The h5py.File() command is expecting a path to a local file on disk. However, you are passing it the data in memory.

You can download the file with:

import boto3

s3_client = boto3.client('s3')

s3_client.download_file('bucket', 'key', 'filename')

with h5py.File('filename', 'r') as f:
    data = f

Solution 2:[2]

A working solution using tempfile for temporary storage. This streams the model data from your s3 bucket into a temp storage and sets it into a variable.

import tempfile
from keras import models
import boto3

# Creating the low level functional client
client = boto3.client(
    's3',
    aws_access_key_id = 'ACCESS_KEY_ID',
    aws_secret_access_key = 'ACCESS_SECRET_KEY',
    region_name = 'us-east-1'
)


# Create the S3 object
response_data = client.get_object(
    Bucket = 'bucket-name',
    Key = 'model/model.h5'
)

model_name='model.h5'
response_data=response_data['Body']
response_data=response_data.read()
#save byte file to temp storage
with tempfile.TemporaryDirectory() as tempdir:
    with open(f"{tempdir}/{model_name}", 'wb') as my_data_file:
        my_data_file.write(response_data)
        #load byte file from temp storage into variable
        gotten_model=models.load_model(f"{tempdir}/{model_name}")
print(gotten_model.summary())

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 John Rotenstein
Solution 2 Samuel Tosan Ayo