'Possible to access an AWS public dataset using Cyberduck?
Cyberduck version: Version 7.9.2
Cyberduck is designed to access non-public AWS buckets. It asks for:
- Server
- Port
- Access Key ID
- Secret Access Key
The Registry of Open Data on AWS provides this information for an open dataset (using the example at https://registry.opendata.aws/target/):
- Resource type: S3 Bucket
- Amazon Resource Name (ARN): arn:aws:s3:::gdc-target-phs000218-2-open
- AWS Region: us-east-1
- AWS CLI Access (No AWS account required): aws s3 ls s3://gdc-target-phs000218-2-open/ --no-sign-request
Is there a version of s3://gdc-target-phs000218-2-open
that can be used in Cyberduck to connect to the data?
Solution 1:[1]
No, it's explicitly stated in the documentation that
You must obtain the login credentials [in order to connect to Amazon S3 in Cyberduck]
Solution 2:[2]
If the bucket is public, any AWS credentials will suffice. So as long as you can create an AWS account, you only need to create an IAM user for yourself with programmatic access, and you are all set.
No doubt, it's a pain because creating an AWS account needs your credit (or debit) card! But see https://stackoverflow.com/a/44825406/1094109 and https://stackoverflow.com/a/44825406/1094109
I tried this with s3://gdc-target-phs000218-2-open
and it worked:
For RODA buckets that provide public access to specific prefixes, you'd need to edit the path to suit. E.g. s3://cellpainting-gallery/jump-pilot/source_4/
(this is a RODA bucket maintained by us, yet to be released fully)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | jellycsc |
Solution 2 |