'Get CSV from google drive and then load to pandas

My Goal is to read a .csv file from google drive and load it to a dataframe.

I tried some answers here but the thing is, the file is not public and needs authentication.

I looked up on goggle drive API but I was stuck there and I don't know how to move forward. I did manage to open google sheet and load it to a dataframe but that is different, this is a sample for google sheet that works.

service = build('sheets', 'v4', credentials=creds)
sheet = service.spreadsheets()
sheets_file = sheet.values().get(
                     spreadsheetId=sheet_id,
                     range=sheet_range
                     ).execute()
    
header = sheets_file.get('values', [])[0]   # Assumes first line is header!
values = sheets_file.get('values', [])[1:]  # Everything else is data.
  
if not values:
    print('No data found.')
else:
    all_data = []
    for col_id, col_name in enumerate(header):
        column_data = []
        for row in values:
            column_data.append(row[col_id])
        ds = pd.Series(data=column_data, name=col_name)
        all_data.append(ds)
        df = pd.concat(all_data, axis=1)
        print(df.head())

I saw some google colab methods too but I cant use that as I am restricted to using python only, any Idea on how to approach this?

Solution 1:^[1]

I believe your goal and situation as follows.

You want to download the CSV data from the CSV file on Google Drive.
You can get values from Google Spreadsheet using googleapis for python.

Pattern 1:

In this pattern, the CSV data is downloaded with googleapis. The downloaded CSV data is saved as a file. And the value is retrieved by the method of "Files: get" in Drive API v3.

Sample script:

file_id = "###"  # Please set the file ID of the CSV file.

service = build('drive', 'v3', credentials=creds)
request = service.files().get_media(fileId=file_id)
fh = io.FileIO("sample.csv", mode='wb')
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
    status, done = downloader.next_chunk()
    print("Download %d%%." % int(status.progress() * 100))

In this case, the CSV data can be converted to the dataframe with df = pd.read_csv("sample.csv").

Pattern 2:

In this pattern, as a simple method, the access token is used from creds. The downloaded CSV data is not saved as a file. And the value is retrieved by the method of "Files: get" in Drive API v3.

Sample script:

file_id = "###"  # Please set the file ID of the CSV file.

access_token = creds.token
url = "https://www.googleapis.com/drive/v3/files/" + file_id + "?alt=media"
res = requests.get(url, headers={"Authorization": "Bearer " + access_token})
print(res.text)

In this case, the CSV data can be directly converted to the dataframe with df = pd.read_csv(io.StringIO(res.text)).

Note:

In the following scripts, please include the scope of https://www.googleapis.com/auth/drive.readonly and/or https://www.googleapis.com/auth/drive. When you modified the scopes, please reauthorize the scopes. By this, the modified scopes are included in the access token. Please be careful this.

Reference:

Download files

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1

'Get CSV from google drive and then load to pandas

Solution 1:[1]

Pattern 1:

Sample script:

Pattern 2:

Sample script:

Note:

Reference:

Sources

Related Questions

Solution 1:^[1]