'Fastest way to read and batch images from AWS S3
I'm wondering if there's a function similar to tf.keras.utils.image_dataset_from_directory that can read images from a folder in AWS S3 and returns them as batches (as numpy arrays, tf Dataset etc.) in Python for preprocessing / training in Sagemaker. Solutions I found thus far involves writing a custom function / for loop to read the images one at a time and batching them manually; I'm wondering if there's a more efficient way to do this.
Solution 1:[1]
Probably you could try using s3fs-fuse to mount your S3 bucket as a local folder on your workstation. This will make a pretend folder on your local computer corresponding to your S3. This way, your function may be tricked to treating the S3 as a regular folder.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Marcin |