'Listing objects in S3 with suffix using boto3
def get_latest_file_movement(**kwargs):
get_last_modified = lambda obj: int(obj['LastModified'].strftime('%s'))
s3 = boto3.client('s3')
objs = s3.list_objects_v2(Bucket='my-bucket',Prefix='prefix')['Contents']
last_added = [obj['Key'] for obj in sorted(objs, key=get_last_modified, reverse=True)][0]
return last_added
Above code gets me the latest file however i only want the files ending with 'csv'
Solution 1:[1]
You can check if they end with .csv:
def get_latest_file_movement(**kwargs):
get_last_modified = lambda obj: int(obj['LastModified'].strftime('%s'))
s3 = boto3.client('s3')
objs = s3.list_objects_v2(Bucket='my-bucket',Prefix='prefix')['Contents']
last_added = [obj['Key'] for obj in sorted(objs, key=get_last_modified, reverse=True) if obj['Key'].endswith('.csv')][0]
return last_added
Solution 2:[2]
Filter by suffix
If the S3 object's key is a filename, the suffix for your objects is a filename-extension (like .csv).
So filter the objects by key ending with .csv.
Use filter(predicate, iterable) operation with predicate as lambda testing for str.endswith(suffix):
s3 = boto3.client('s3')
objs = s3.list_objects_v2(Bucket='my-bucket',Prefix='prefix')['Contents']
csvs = filter(lambda obj: obj['Key'].endswith('.csv'), objs) # csv only
csvs.sort(key=lambda obj: obj['LastModified'], reverse=True) # last first, sort by modified-timestamp descending
return csvs[0]
Note: To get the last-modified only
This solution alternates the sort direction using reverse=True (descending) to pick the first which will be the last modified.
You can also sort default (ascending) and pick the last with [-1] as answered by Kache in your preceding question.
Simplification
From the boto3 list_objects_v2 docs about the response structure:
Contents (list) ... LastModified (datetime) -- Creation date of the object.
Boto3 returns a datetime object for LastModified. See also Getting S3 objects' last modified datetimes with boto.
So why do we need additional steps to format it as string and then convert to int: int(obj['LastModified'].strftime('%s')) ?
Python can also sort the datetime directly.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 |
