'GCP Cloud Function to write data to BigQuery runs with success but data doesn't appear in BigQuery table

I am running the following cloud function. It runs with success and indicates data was loaded to the table. But when I query the BigQuery no data has been added. I am getting no errors and no indication that it isn't working.

from google.cloud import bigquery
import pandas as pd


def download_data(event, context):

     df = pd.read_csv('https://covid.ourworldindata.org/data/ecdc/full_data.csv')

     # Create an empty list 
     Row_list =[] 

     # Iterate over each row 
     for index, rows in df.iterrows(): 
          # Create list for the current row 
          my_list =[rows.date, rows.location, rows.new_cases, rows.new_deaths, rows.total_cases, rows.total_deaths] 
          #print(my_list)     
     # append the list to the final list 
     Row_list.append(my_list) 


     ## Get Biq Query Set up
     client = bigquery.Client()
     table_id = "<project_name>.raw.daily_load"
     table = client.get_table(table_id)

     print(client)
     print(table_id)
     print(table)


     errors = client.insert_rows(table, Row_list)  # Make an API request.
     if errors == []:
          print("New rows have been added.")

Attempted so far;

Check data was being pulled -> PASSED, I printed out row_list and data is there
Run locally from my machine -> PASSED, data appeared when I ran it from a python terminal
Print out the table details -> PASSED, see attached screenshot it all appears in the logs
Confirm it is able to find the table -> PASSED, I changed the name of the table to one that didn't exist and it failed

Not sure what is next, any advice would be greatly appreciated

Solution 1:^[1]

Assuming that the App Engine default service account has the default Editor role assigned and that you have a very simple schema for the BigQuery table. For example:

Field name      Type          Mode       Policy tags    Description
date            STRING        NULLABLE  
location        STRING        NULLABLE  
new_cases       INTEGER       NULLABLE  
new_deaths      INTEGER       NULLABLE  
total_cases     INTEGER       NULLABLE  
total_deaths    INTEGER       NULLABLE

The following modification of your code should work for an HTTP triggered function. Notice that you were not including the Row_list.append(my_list) in the for loop to populate your list with the elements and that according to the samples on the documentation you should be using a list of tuples:

from google.cloud import bigquery
import pandas as pd

client = bigquery.Client()
table_id = "[PROJECT-ID].[DATASET].[TABLE]"

def download_data(request):

     df = pd.read_csv('https://covid.ourworldindata.org/data/ecdc/full_data.csv')

     # Create an empty list 
     Row_list =[] 

     # Iterate over each row 
     for index, rows in df.iterrows(): 
          # Create list for the current row 
          my_list =(rows.date, rows.location, rows.new_cases, rows.new_deaths, rows.total_cases, rows.total_deaths)  
          # append the list to the final list 
          Row_list.append(my_list) 


     ## Get Biq Query Set up
     table = client.get_table(table_id)

     errors = client.insert_rows(table, Row_list)  # Make an API request.
     if errors == []:
          print("New rows have been added.")

With the very simple requirements.txt file:

# Function dependencies, for example:
# package>=version
pandas
google-cloud-bigquery

Solution 2:^[2]

You can directly stream the data from the website to BigQuery using Cloud Functions but the data should be clean and conform to BigQuery standards else the e insertion will fail. One more point to note is that the dataframe columns must match the table columns for the data to be successfully inserted. I tested this out and saw insertion errors returned by the client when the column names didn’t match.

Writing the function

I have created a simple Cloud Function using the documentation and pandas example. The dependencies that need to be included are google-cloud-bigquery and pandas.

main.py

from google.cloud import bigquery
import pandas as pd
 
def hello_gcs(event,context):
   df = pd.read_csv('https://people.sc.fsu.edu/~jburkardt/data/csv/airtravel.csv')
   df.set_axis(["Month", "Year_1", "Year_2", "Year_3"], axis=1, inplace=True) ## => Rename the columns if necessary
   table_id = "project.dataset.airtravel"
  
   ## Get BiqQuery Set up
   client = bigquery.Client()
   table = client.get_table(table_id)
   errors = client.insert_rows_from_dataframe(table, df)  # Make an API request.
   if errors == []:
        print("Data Loaded")
        return "Success"
   else:
        print(errors)
        return "Failed"

requirements.txt

# Function dependencies, for example:
# package>=version
google-cloud-bigquery
pandas

Now you can directly deploy the function.

Output

Output Table

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Daniel Ocando
Solution 2

'GCP Cloud Function to write data to BigQuery runs with success but data doesn't appear in BigQuery table

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]