'Write csv to google cloud storage
I am trying to understand how to write a multiple line csv file to google cloud storage. I'm just not following the documentation
Close to here: Unable to read csv file uploaded on google cloud storage bucket
Example:
from google.cloud import storage
from oauth2client.client import GoogleCredentials
import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = "<pathtomycredentials>"
a=[1,2,3]
b=['a','b','c']
storage_client = storage.Client()
bucket = storage_client.get_bucket("<mybucketname>")
blob=bucket.blob("Hummingbirds/trainingdata.csv")
for eachrow in range(3):
blob.upload_from_string(str(a[eachrow]) + "," + str(b[eachrow]))
That gets you a single line on google cloud storage
3,c
clearly it opened a new file each time and wrote the line.
Okay, how about adding a new line delim?
for eachrow in range(3):
blob.upload_from_string(str(a[eachrow]) + "," + str(b[eachrow]) + "\n")
that adds the line break, but again writes from the beginning.
Can someone illustrate what the approach is? I could combine all my lines into one string, or write a temp file, but that seems very ugly.
Perhaps with open as file?
Solution 1:[1]
The blob.upload_from_string(data)
method creates a new object whose contents are exactly the contents of the string data
. It overwrites over existing objects rather than appending.
The easiest solution would be to write your whole CSV to a temporary file and then upload that file to GCS with the blob.upload_from_filename(filename)
function.
Solution 2:[2]
Please refer to below answer, hope it helps.
import pandas as pd
data = [['Alex','Feb',10],['Bob','jan',12]]
df = pd.DataFrame(data,columns=['Name','Month','Age'])
print df
Output
Name Month Age
0 Alex Feb 10
1 Bob jan 12
Add a row
row = ['Sally','Oct',15]
df.loc[len(df)] = row
print df
output
Name Month Age
0 Alex Feb 10
1 Bob jan 12
2 Sally Oct 15
write/copy to GCP Bucket using gsutil
df.to_csv('text.csv', index = False)
!gsutil cp 'text.csv' 'gs://BucketName/folderName/'
Python code (docs https://googleapis.dev/python/storage/latest/index.html )
from google.cloud import storage
def upload_to_bucket(bucket_name, blob_path, local_path):
bucket = storage.Client().bucket(bucket_name)
blob = bucket.blob(blob_path)
blob.upload_from_filename(local_path)
return blob.url
# method call
bucket_name = 'bucket-name' # do not give gs:// ,just bucket name
blob_path: = 'path/folder name inside bucket'
local_path = 'local_machine_path_where_file_resides' #local file path
upload_to_bucket(bucket_name, blob_path, local_path)
Solution 3:[3]
from google.cloud import storage
from oauth2client.client import GoogleCredentials
import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = "<pathtomycredentials>"
a=[1,2,3]
b=['a','b','c']
storage_client = storage.Client()
bucket = storage_client.get_bucket("<mybucketname>")
blob=bucket.blob("Hummingbirds/trainingdata.csv")
# build up the complete csv string
csv_string_to_upload = ''
for eachrow in range(3):
# add the lines
csv_string_to_upload = csv_string_to_upload + str(a[eachrow]) + ',' + b[eachrow] + '\n'
# upload the complete csv string
blob.upload_from_string(
data=csv_string_to_upload,
content_type='text/csv'
)
Solution 4:[4]
Just ran into this post after I have encountered the exact same problem. After a lot of struggles, I found that the best solution for me is to upload the .csv file as bytes. Here is how I did that:
new_csv_filename = csv_path + "report_" + start_date_str + "-" + end_date_str +
".csv"
df.to_csv(new_csv_filename, index=False)
# upload the file to the storage
blob = bucket.blob(new_csv_filename)
with open(new_csv_filename, 'rb') as f: # here we open the file with read bytes option
blob.upload_from_file(f) # upload from file is now uploading the file as bytes
blob.make_public()
# generate a download url and return it
return blob.public_url
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | |
Solution 3 | |
Solution 4 | Arielel |