'Upload large csv file to cloud storage using Python
Hi I am trying to upload a large csv file but I am getting the below error:
HTTPSConnectionPool(host='storage.googleapis.com', port=443): Max retries exceeded with url: /upload/storage/v1/b/de-bucket-my-stg/o?uploadType=resumable&upload_id=ADPycdsyu6gSlyfklixvDgL7RLpAQAg6REm9j1ICarKvmdif3tASOl9MaqjQIZ5dHWpTeWqs2HCsL4hoqfrtVQAH1WpfYrp4sFRn (Caused by SSLError(SSLWantWriteError(3, 'The operation did not complete (write) (_ssl.c:2396)')))
Can someone help me on this?
Below is my code for it:
import os
import pandas as pd
import io
import requests
from google.cloud import storage
try:
url = "https://cb-test-dataset.s3.ap-south-1.amazonaws.com/analytics/analytics.csv"
cont = requests.get(url).content
file_to_upload = pd.read_csv(io.StringIO(cont.decode('utf-8')))
except Exception as e:
print('Error getting file: ' + str(e))
try:
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'C:/Users/haris/Desktop/de-project/xxx.json' --xxx is replaced here.
storage_client = storage.Client()
bucket_name = storage_client.get_bucket('de-bucket-my-stg')
blob = bucket_name.blob('analytics.csv')
blob.upload_from_string(file_to_upload.to_csv(),'text/csv')
except Exception as e:
print('Error uploading file: ' + str(e))
Solution 1:[1]
As mentioned in the documentation,
My recommendation is to gzip your file before sending it. Text file has an high compression rate (up to 100 times) and you can ingest gzip files directly into BigQuery without unzipped them
The fastest method of uploading to Cloud Storage is to use the compose API and composite objects.
For more information, you can refer to the stackoverflow thread where OP is facing a similar error.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Divyani Yadav |