'Load Pandas Dataframe to S3 passing s3_additional_kwargs

Please excuse my ignorance / lack of knowledge in this area!

I'm looking to upload a dataframe to S3, but I need to pass 'ACL':'bucket-owner-full-control'.

import pandas as pd
import s3fs

fs = s3fs.S3FileSystem(anon=False, s3_additional_kwargs={'ACL': 'bucket-owner-full-control'})
df = pd.DataFrame()
df['test'] = [1,2,3]
df.head()

df.to_parquet('s3://path/to/file/df.parquet', compression='gzip')

I have managed to get around this by then loading this to a Pyarrow table and the loading like:

import pyarrow.parquet as pq

table = pa.Table.from_pandas(df)

pq.write_to_dataset(table=table, 
                    root_path='s3://path/to/file/',
                    filesystem=fs)

But this feels hacky and I feel there must be a way to pass the ACL in the first example.

Solution 1:^[1]

With Pandas 1.2.0, there is storage_options as mentioned here.

If you are stuck with Pandas < 1.2.0 (1.1.3 in my case), this trick did help:

storage_options = dict(anon=False, s3_additional_kwargs=dict(ACL="bucket-owner-full-control"))

import s3fs
fs = s3fs.S3FileSystem(**storage_options)
df.to_parquet('s3://foo/bar.parquet', filesystem=fs)

Solution 2:^[2]

You can do it :

pd.to_parquet('name.parquet',storage_options={"key":xxxxx,"secret":gcp_secret_access_key,'xxxxx':{'ACL': 'bucket-owner-full-control'}})

Solution 3:^[3]

As mentioned before, with Pandas 1.2.0 there is a storage_options argument to most writer functions (to_csv, to_parquet, etc.). To set the ACL when writing to S3 (in this case the file system backend that is used is s3fs) you can use this example:

ACL = dict(storage_options=dict(s3_additional_kwargs=dict(ACL='bucket-owner-full-control')))

import pandas as pd
df = pd.DataFrame({"column": [1,2,3,4]})
df.to_parquet("s3://bucket/file.parquet", **ACL)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Sergey Vasilyev
Solution 2	Simas Joneliunas
Solution 3

'Load Pandas Dataframe to S3 passing s3_additional_kwargs

Solution 1:[1]

Solution 2:[2]

Solution 3:[3]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]

Solution 3:^[3]