'Fastest way to read and batch images from AWS S3

I'm wondering if there's a function similar to tf.keras.utils.image_dataset_from_directory that can read images from a folder in AWS S3 and returns them as batches (as numpy arrays, tf Dataset etc.) in Python for preprocessing / training in Sagemaker. Solutions I found thus far involves writing a custom function / for loop to read the images one at a time and batching them manually; I'm wondering if there's a more efficient way to do this.



Solution 1:[1]

Probably you could try using s3fs-fuse to mount your S3 bucket as a local folder on your workstation. This will make a pretend folder on your local computer corresponding to your S3. This way, your function may be tricked to treating the S3 as a regular folder.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Marcin