'Get CSV from google drive and then load to pandas
My Goal is to read a .csv file from google drive and load it to a dataframe.
I tried some answers here but the thing is, the file is not public and needs authentication.
I looked up on goggle drive API but I was stuck there and I don't know how to move forward. I did manage to open google sheet and load it to a dataframe but that is different, this is a sample for google sheet that works.
service = build('sheets', 'v4', credentials=creds)
sheet = service.spreadsheets()
sheets_file = sheet.values().get(
spreadsheetId=sheet_id,
range=sheet_range
).execute()
header = sheets_file.get('values', [])[0] # Assumes first line is header!
values = sheets_file.get('values', [])[1:] # Everything else is data.
if not values:
print('No data found.')
else:
all_data = []
for col_id, col_name in enumerate(header):
column_data = []
for row in values:
column_data.append(row[col_id])
ds = pd.Series(data=column_data, name=col_name)
all_data.append(ds)
df = pd.concat(all_data, axis=1)
print(df.head())
I saw some google colab methods too but I cant use that as I am restricted to using python only, any Idea on how to approach this?
Solution 1:[1]
I believe your goal and situation as follows.
- You want to download the CSV data from the CSV file on Google Drive.
- You can get values from Google Spreadsheet using googleapis for python.
Pattern 1:
In this pattern, the CSV data is downloaded with googleapis. The downloaded CSV data is saved as a file. And the value is retrieved by the method of "Files: get" in Drive API v3.
Sample script:
file_id = "###" # Please set the file ID of the CSV file.
service = build('drive', 'v3', credentials=creds)
request = service.files().get_media(fileId=file_id)
fh = io.FileIO("sample.csv", mode='wb')
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print("Download %d%%." % int(status.progress() * 100))
- In this case, the CSV data can be converted to the dataframe with
df = pd.read_csv("sample.csv")
.
Pattern 2:
In this pattern, as a simple method, the access token is used from creds
. The downloaded CSV data is not saved as a file. And the value is retrieved by the method of "Files: get" in Drive API v3.
Sample script:
file_id = "###" # Please set the file ID of the CSV file.
access_token = creds.token
url = "https://www.googleapis.com/drive/v3/files/" + file_id + "?alt=media"
res = requests.get(url, headers={"Authorization": "Bearer " + access_token})
print(res.text)
- In this case, the CSV data can be directly converted to the dataframe with
df = pd.read_csv(io.StringIO(res.text))
.
Note:
- In the following scripts, please include the scope of
https://www.googleapis.com/auth/drive.readonly
and/orhttps://www.googleapis.com/auth/drive
. When you modified the scopes, please reauthorize the scopes. By this, the modified scopes are included in the access token. Please be careful this.
Reference:
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |