'Load Pandas Dataframe to S3 passing s3_additional_kwargs
Please excuse my ignorance / lack of knowledge in this area!
I'm looking to upload a dataframe to S3, but I need to pass 'ACL':'bucket-owner-full-control'.
import pandas as pd
import s3fs
fs = s3fs.S3FileSystem(anon=False, s3_additional_kwargs={'ACL': 'bucket-owner-full-control'})
df = pd.DataFrame()
df['test'] = [1,2,3]
df.head()
df.to_parquet('s3://path/to/file/df.parquet', compression='gzip')
I have managed to get around this by then loading this to a Pyarrow table and the loading like:
import pyarrow.parquet as pq
table = pa.Table.from_pandas(df)
pq.write_to_dataset(table=table,
root_path='s3://path/to/file/',
filesystem=fs)
But this feels hacky and I feel there must be a way to pass the ACL in the first example.
Solution 1:[1]
With Pandas 1.2.0, there is storage_options
as mentioned here.
If you are stuck with Pandas < 1.2.0 (1.1.3 in my case), this trick did help:
storage_options = dict(anon=False, s3_additional_kwargs=dict(ACL="bucket-owner-full-control"))
import s3fs
fs = s3fs.S3FileSystem(**storage_options)
df.to_parquet('s3://foo/bar.parquet', filesystem=fs)
Solution 2:[2]
You can do it :
pd.to_parquet('name.parquet',storage_options={"key":xxxxx,"secret":gcp_secret_access_key,'xxxxx':{'ACL': 'bucket-owner-full-control'}})
Solution 3:[3]
As mentioned before, with Pandas 1.2.0 there is a storage_options
argument to most writer functions (to_csv
, to_parquet
, etc.). To set the ACL when writing to S3 (in this case the file system backend that is used is s3fs
) you can use this example:
ACL = dict(storage_options=dict(s3_additional_kwargs=dict(ACL='bucket-owner-full-control')))
import pandas as pd
df = pd.DataFrame({"column": [1,2,3,4]})
df.to_parquet("s3://bucket/file.parquet", **ACL)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Sergey Vasilyev |
Solution 2 | Simas Joneliunas |
Solution 3 |