Category "python-s3fs"

Read timeout in pd.read_parquet from S3, and understanding configs

I'm trying to simplify access to datasets in various file formats (csv, pickle, feather, partitioned parquet, ...) stored as S3 objects. Since some users I supp

Attempting to Cache s3 files

I have two pipelines that I run. The first pipeline reads files from s3 does some processing and updates the files. The second pipeline runs multiple jobs and f

Pandas read_csv specify AWS Profile

Pandas (v1.0.5) use s3fs library to connect with AWS S3 and read data. By default, s3fs uses the credentials found in ~/.aws/credentials file in default profile

How to read partitioned parquet files from S3 using pyarrow in python

I looking for ways to read data from multiple partitioned directories from s3 using python. data_folder/serial_number=1/cur_date=20-12-2012/abcdsd0324324.snapp