'Pandas read_csv specify AWS Profile

Pandas (v1.0.5) use s3fs library to connect with AWS S3 and read data. By default, s3fs uses the credentials found in ~/.aws/credentials file in default profile. How do I specify which profile should pandas use while reading a CSV from S3?

Eg.

s3_path = 's3://mybucket/myfile.csv'
df = pd.read_csv(s3_path)
$ cat ~/.aws/credentials
[default]
aws_access_key_id = ABCD
aws_secret_access_key = XXXX
[profile2]
aws_access_key_id = PQRS
aws_secret_access_key = YYYY
[profile3]
aws_access_key_id = XYZW
aws_secret_access_key = ZZZZ

Edit :

Current hack/working solution :

import botocore
import s3fs
session = botocore.session.Session(profile='profile2')
s3 = s3fs.core.S3FileSystem(anon=False, session=session)
df = pd.read_csv( s3.open(path_to_s3_csv) )

The only issue with above solution is you need to import 2 different libraries and instantiate 2 objects. Keeping the question open to see if there is another cleaner/simple method.



Solution 1:[1]

import s3fs
s3 = s3fs.S3FileSystem(anon=False, profile_name="your-profile-name")

I believe to not use boto, you can use this S3FileSystem part of the s3fs. Then with a file handler something like:

with s3.open('bucket/file.txt', 'rb') as f:

Solution 2:[2]

If you only need to use one profile, setting the environment variable "AWS_DEFAULT_PROFILE" works:

import os
os.environ["AWS_DEFAULT_PROFILE"] = "profile2"
df = pd.read_csv(path_to_s3_csv)

Solution 3:[3]

I'm not sure that this is "better" but it seems to be working for me using boto3 directly without needing to use s3fs or set an env variable.

import boto3
import pandas as pd

s3_session = boto3.Session(profile_name="profile_name")
s3_client = s3_session.client("s3")
df = pd.read_csv(s3_client.get_object(Bucket='bucket', Key ='key.csv').get('Body'))

Solution 4:[4]

df = pd.read_csv(s3_path, storage_options=dict(profile='profile2'))

Solution 5:[5]

If you are unable to configure your .aws/config file:

import pandas as pd
import s3fs

KEY_ID = 'xxxx'
ACCESS_KEY = 'yyyy'
BUCKET = 'my-bucket'
fp = 's3://my-bucket/test/abc.csv'


fs = s3fs.S3FileSystem(anon=False, key=KEY_ID, secret=ACCESS_KEY)
with fs.open(fp) as f:
   df = pd.read_csv(f)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Edward Mendez
Solution 2 Anil Sharma
Solution 3 Scott Brenstuhl
Solution 4 loknar
Solution 5 yl_low