'Pandas read_csv specify AWS Profile
Pandas (v1.0.5) use s3fs
library to connect with AWS S3 and read data. By default, s3fs uses the credentials found in ~/.aws/credentials
file in default
profile. How do I specify which profile should pandas use while reading a CSV from S3?
Eg.
s3_path = 's3://mybucket/myfile.csv'
df = pd.read_csv(s3_path)
$ cat ~/.aws/credentials
[default]
aws_access_key_id = ABCD
aws_secret_access_key = XXXX
[profile2]
aws_access_key_id = PQRS
aws_secret_access_key = YYYY
[profile3]
aws_access_key_id = XYZW
aws_secret_access_key = ZZZZ
Edit :
Current hack/working solution :
import botocore
import s3fs
session = botocore.session.Session(profile='profile2')
s3 = s3fs.core.S3FileSystem(anon=False, session=session)
df = pd.read_csv( s3.open(path_to_s3_csv) )
The only issue with above solution is you need to import 2 different libraries and instantiate 2 objects. Keeping the question open to see if there is another cleaner/simple method.
Solution 1:[1]
import s3fs
s3 = s3fs.S3FileSystem(anon=False, profile_name="your-profile-name")
I believe to not use boto, you can use this S3FileSystem part of the s3fs. Then with a file handler something like:
with s3.open('bucket/file.txt', 'rb') as f:
Solution 2:[2]
If you only need to use one profile, setting the environment variable "AWS_DEFAULT_PROFILE" works:
import os
os.environ["AWS_DEFAULT_PROFILE"] = "profile2"
df = pd.read_csv(path_to_s3_csv)
Solution 3:[3]
I'm not sure that this is "better" but it seems to be working for me using boto3 directly without needing to use s3fs
or set an env variable.
import boto3
import pandas as pd
s3_session = boto3.Session(profile_name="profile_name")
s3_client = s3_session.client("s3")
df = pd.read_csv(s3_client.get_object(Bucket='bucket', Key ='key.csv').get('Body'))
Solution 4:[4]
df = pd.read_csv(s3_path, storage_options=dict(profile='profile2'))
Solution 5:[5]
If you are unable to configure your .aws/config file:
import pandas as pd
import s3fs
KEY_ID = 'xxxx'
ACCESS_KEY = 'yyyy'
BUCKET = 'my-bucket'
fp = 's3://my-bucket/test/abc.csv'
fs = s3fs.S3FileSystem(anon=False, key=KEY_ID, secret=ACCESS_KEY)
with fs.open(fp) as f:
df = pd.read_csv(f)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Edward Mendez |
Solution 2 | Anil Sharma |
Solution 3 | Scott Brenstuhl |
Solution 4 | loknar |
Solution 5 | yl_low |