'How to use boto3 (or other Python) to list the contents of a _RequesterPays_ S3 bucket?
You can download a file via boto3 from a RequesterPays S3 bucket, as follows:
s3_client.download_file('aws-naip', 'md/2013/1m/rgbir/38077/{}'.format(filename), full_path, {'RequestPayer':'requester'})
What I can't figure out is how to list the objects in the bucket... I get an authentication error when I try and call objects.all() on the bucket.
How can I use boto3 to enumerate the contents of a RequesterPays bucket? Please note this is a particular kind of bucket where the requester pays the S3 charges.
Solution 1:[1]
From boto3, we can see that there is a #S3.Client.list_objects
method. This can be used to enumerate objects:
import boto3
s3_client = boto3.client('s3')
resp = s3_client.list_objects(Bucket='RequesterPays')
# print names of all objects
for obj in resp['Contents']:
print 'Object Name: %s' % obj['Key']
Output:
Object Name: pic.gif
Object Name: doc.txt
Object Name: page.html
If you are getting a 401 then make sure that IAM user calling the API has s3:GetObject
permissions on the bucket.
Solution 2:[2]
You have to pass the RequestPayer
kwarg to the list_objects
method.
Also, according to the boto3 docs,
Note: ListObjectsV2 is the revised List Objects API and we recommend you use this revised API for new application development
Putting that together with pagination would look like:
import boto3
s3_client = boto3.client('s3')
def get_keys(bucket, prefix, requester_pays=False):
"""Get s3 objects from a bucket/prefix
optionally use requester-pays header
"""
extra_kwargs = {}
if requester_pays:
extra_kwargs = {'RequestPayer': 'requester'}
next_token = 'init'
while next_token:
kwargs = extra_kwargs.copy()
if next_token != 'init':
kwargs.update({'ContinuationToken': next_token})
resp = s3_client.list_objects_v2(
Bucket=bucket, Prefix=prefix, **kwargs)
try:
next_token = resp['NextContinuationToken']
except KeyError:
next_token = None
for contents in resp['Contents']:
key = contents['Key']
yield key
and would be used like
x = list(get_keys('aws-naip', 'co', requester_pays=True))
Solution 3:[3]
I had the same issue so here is the code:
import boto3
s3 = boto3.resource('s3')
for bucket in s3.buckets.all():
print(bucket.name)
client = boto3.client('s3')
result= client.list_objects(Bucket='bucketname',RequestPayer='requester')
for o in result['Contents']:
print(o['Key'])
The response to the query is a dictionary, and within that dictionary there is another dictionary named contents where the keys are the paths to the objects. You can check the response fields in the following link: List_objects documentation
Note : list_objects returns up to 1000 contents so you would have to iterate over with the next_marker property (I will update this answer if you would like the full list) . I guess you have already figured out how to setup the access key and secret key. Let me know if you need more details on that.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Raf |
Solution 2 | perrygeo |
Solution 3 | Alexis Kanter |