'Write csv to google cloud storage

I am trying to understand how to write a multiple line csv file to google cloud storage. I'm just not following the documentation

Close to here: Unable to read csv file uploaded on google cloud storage bucket

Example:

from google.cloud import storage
from oauth2client.client import GoogleCredentials
import os

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = "<pathtomycredentials>"

a=[1,2,3]

b=['a','b','c']

storage_client = storage.Client()
bucket = storage_client.get_bucket("<mybucketname>")

blob=bucket.blob("Hummingbirds/trainingdata.csv")

for eachrow in range(3):
    blob.upload_from_string(str(a[eachrow]) + "," + str(b[eachrow]))

That gets you a single line on google cloud storage

3,c

clearly it opened a new file each time and wrote the line.

Okay, how about adding a new line delim?

for eachrow in range(3):
    blob.upload_from_string(str(a[eachrow]) + "," + str(b[eachrow]) + "\n")

that adds the line break, but again writes from the beginning.

Can someone illustrate what the approach is? I could combine all my lines into one string, or write a temp file, but that seems very ugly.

Perhaps with open as file?



Solution 1:[1]

The blob.upload_from_string(data) method creates a new object whose contents are exactly the contents of the string data. It overwrites over existing objects rather than appending.

The easiest solution would be to write your whole CSV to a temporary file and then upload that file to GCS with the blob.upload_from_filename(filename) function.

Solution 2:[2]

Please refer to below answer, hope it helps.

import pandas as pd
 data = [['Alex','Feb',10],['Bob','jan',12]]
 df = pd.DataFrame(data,columns=['Name','Month','Age'])
 print df

Output

   Name Month  Age
0  Alex   Feb   10
1   Bob   jan   12

Add a row

row = ['Sally','Oct',15]
df.loc[len(df)] = row
print df

output

     Name Month  Age
 0   Alex   Feb   10
 1    Bob   jan   12
 2  Sally   Oct   15

write/copy to GCP Bucket using gsutil

  df.to_csv('text.csv', index = False)
 !gsutil cp 'text.csv' 'gs://BucketName/folderName/'

Python code (docs https://googleapis.dev/python/storage/latest/index.html )

from google.cloud import storage

def upload_to_bucket(bucket_name, blob_path, local_path):
    bucket = storage.Client().bucket(bucket_name)
    blob = bucket.blob(blob_path)
    blob.upload_from_filename(local_path)
    return blob.url

# method call
bucket_name = 'bucket-name' # do not give gs:// ,just bucket name
blob_path: = 'path/folder name inside bucket'
local_path = 'local_machine_path_where_file_resides' #local file path
upload_to_bucket(bucket_name, blob_path, local_path)

Solution 3:[3]

from google.cloud import storage
from oauth2client.client import GoogleCredentials
import os

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = "<pathtomycredentials>"

a=[1,2,3]

b=['a','b','c']

storage_client = storage.Client()
bucket = storage_client.get_bucket("<mybucketname>")

blob=bucket.blob("Hummingbirds/trainingdata.csv")

# build up the complete csv string
csv_string_to_upload = ''

for eachrow in range(3):
    # add the lines
    csv_string_to_upload = csv_string_to_upload + str(a[eachrow]) + ',' + b[eachrow] + '\n'

# upload the complete csv string
blob.upload_from_string(
            data=csv_string_to_upload,
            content_type='text/csv'
        )

Solution 4:[4]

Just ran into this post after I have encountered the exact same problem. After a lot of struggles, I found that the best solution for me is to upload the .csv file as bytes. Here is how I did that:

new_csv_filename = csv_path + "report_" + start_date_str + "-" + end_date_str + 
".csv"
df.to_csv(new_csv_filename, index=False)
# upload the file to the storage
blob = bucket.blob(new_csv_filename)
with open(new_csv_filename, 'rb') as f:  # here we open the file with read bytes option
    blob.upload_from_file(f)   # upload from file is now uploading the file as bytes
blob.make_public()
# generate a download url and return it
return blob.public_url 

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3
Solution 4 Arielel