'Listing objects in S3 with suffix using boto3

def get_latest_file_movement(**kwargs):
    get_last_modified = lambda obj: int(obj['LastModified'].strftime('%s'))
    s3 = boto3.client('s3')
    objs = s3.list_objects_v2(Bucket='my-bucket',Prefix='prefix')['Contents']
    last_added = [obj['Key'] for obj in sorted(objs, key=get_last_modified, reverse=True)][0]
    return last_added

Above code gets me the latest file however i only want the files ending with 'csv'



Solution 1:[1]

You can check if they end with .csv:

def get_latest_file_movement(**kwargs):
    get_last_modified = lambda obj: int(obj['LastModified'].strftime('%s'))
    s3 = boto3.client('s3')
    objs = s3.list_objects_v2(Bucket='my-bucket',Prefix='prefix')['Contents']

    last_added = [obj['Key'] for obj in sorted(objs, key=get_last_modified, reverse=True) if obj['Key'].endswith('.csv')][0]

    return last_added

Solution 2:[2]

Filter by suffix

If the S3 object's key is a filename, the suffix for your objects is a filename-extension (like .csv).

So filter the objects by key ending with .csv.

Use filter(predicate, iterable) operation with predicate as lambda testing for str.endswith(suffix):

s3 = boto3.client('s3')
objs = s3.list_objects_v2(Bucket='my-bucket',Prefix='prefix')['Contents']

csvs = filter(lambda obj: obj['Key'].endswith('.csv'), objs)  # csv only 
csvs.sort(key=lambda obj: obj['LastModified'], reverse=True)  # last first, sort by modified-timestamp descending

return csvs[0]

Note: To get the last-modified only

This solution alternates the sort direction using reverse=True (descending) to pick the first which will be the last modified. You can also sort default (ascending) and pick the last with [-1] as answered by Kache in your preceding question.

Simplification

From the boto3 list_objects_v2 docs about the response structure:

Contents (list) ... LastModified (datetime) -- Creation date of the object.

Boto3 returns a datetime object for LastModified. See also Getting S3 objects' last modified datetimes with boto.

So why do we need additional steps to format it as string and then convert to int: int(obj['LastModified'].strftime('%s')) ?

Python can also sort the datetime directly.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2