'Listing objects in S3 with suffix using boto3
def get_latest_file_movement(**kwargs):
get_last_modified = lambda obj: int(obj['LastModified'].strftime('%s'))
s3 = boto3.client('s3')
objs = s3.list_objects_v2(Bucket='my-bucket',Prefix='prefix')['Contents']
last_added = [obj['Key'] for obj in sorted(objs, key=get_last_modified, reverse=True)][0]
return last_added
Above code gets me the latest file however i only want the files ending with 'csv'
Solution 1:[1]
You can check if they end with .csv
:
def get_latest_file_movement(**kwargs):
get_last_modified = lambda obj: int(obj['LastModified'].strftime('%s'))
s3 = boto3.client('s3')
objs = s3.list_objects_v2(Bucket='my-bucket',Prefix='prefix')['Contents']
last_added = [obj['Key'] for obj in sorted(objs, key=get_last_modified, reverse=True) if obj['Key'].endswith('.csv')][0]
return last_added
Solution 2:[2]
Filter by suffix
If the S3 object's key is a filename, the suffix for your objects is a filename-extension (like .csv
).
So filter the objects by key ending with .csv
.
Use filter(predicate, iterable)
operation with predicate as lambda testing for str.endswith(suffix)
:
s3 = boto3.client('s3')
objs = s3.list_objects_v2(Bucket='my-bucket',Prefix='prefix')['Contents']
csvs = filter(lambda obj: obj['Key'].endswith('.csv'), objs) # csv only
csvs.sort(key=lambda obj: obj['LastModified'], reverse=True) # last first, sort by modified-timestamp descending
return csvs[0]
Note: To get the last-modified only
This solution alternates the sort direction using reverse=True
(descending) to pick the first which will be the last modified.
You can also sort
default (ascending) and pick the last with [-1]
as answered by Kache in your preceding question.
Simplification
From the boto3 list_objects_v2
docs about the response structure:
Contents (list) ... LastModified (datetime) -- Creation date of the object.
Boto3 returns a datetime object for LastModified
. See also Getting S3 objects' last modified datetimes with boto.
So why do we need additional steps to format it as string and then convert to int: int(obj['LastModified'].strftime('%s'))
?
Python can also sort the datetime directly.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 |