'How to use asyncio to download files on s3 bucket

I'm using the following code to download all my files in a s3 bucket:

def main(bucket_name, destination_dir):
    bucket = boto3.resource('s3').Bucket(bucket_name)
    for obj in bucket.objects.all():
        if obj.key.endswith('/'):
            continue
        destination = '%s/%s' % (bucket_name, obj.key)
        if not os.path.exists(destination):
            os.makedirs(os.path.dirname(destination), exist_ok=True)
        bucket.download_file(obj.key, destination)

I would like to know how to make this asynchronous, if possible.

Thank u in advance.



Solution 1:[1]

Aiobotocore provides asyncio support for botocore library using aiohttp. If you are willing to modify your code to use botocore instead, that would be a solution.

Solution 2:[2]

Out of the box, boto3 doesn't support asyncio. There's a tracking issue opened on this that offers some workarounds; they may or may not work for your use case.

Solution 3:[3]

You can use the generate_presigned_url method of the s3 client to get the URL with AWS credentials (see docs), and then send a request to download a file through the async HTTP client (aiohttp for example)

aiohttp applies URL canonicalization which can cause issues if the Key includes spaces or non-ASCII characters. Using URL(..., encoded=True) will fix this issue.

import boto3
import asyncio
from aiohttp import client
from yarl import URL

bucket = 'some-bucket-name'

s3_client = boto3.client('s3')
s3_objs = s3_client.list_objects(Bucket=bucket)['Contents']

async def download_s3_obj(key: str, aiohttp_session: client.ClientSession):
    request_url = s3_client.generate_presigned_url('get_object', {
        'Bucket': bucket,
        'Key': key
    })

    async with aiohttp_session.get(URL(request_url, encoded=True)) as response:
        file_path = 'some-local-folder-name/' + key.split('/')[-1]

        with open(file_path, 'wb') as file:
            file.write(await response.read())

async def get_tasks():
    session = client.ClientSession()

    return [download_s3_obj(f['Key'], session) for f in s3_objs], session

loop = asyncio.get_event_loop()
tasks, session = loop.run_until_complete(get_tasks())
loop.run_until_complete(asyncio.gather(*tasks))

loop.run_until_complete(session.close())

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 M. de Young
Solution 2 Corey Quinn
Solution 3 John P