'Is there a way to upload the byte type created with zlib to Google bigquery?
I want to input string data into bigquery by implied by pyhton's zlib library.
Here is an example code that uses zlib to generate data:
import zlib
import pandas as pd
string = 'abs'
df = pd.DataFrame()
data = zlib.compress(bytearray(string, encoding='utf-8'), -1)
df.append({'id' : 1, 'data' : data}, ignore_index=True)
I've also tried both methods provided by the bigquery API, but both of them give me an error.
The schema is:
job_config = bigquery.LoadJobConfig(
schema=[
bigquery.SchemaField("id", bigquery.enums.SqlTypeNames.NUMERIC),
bigquery.SchemaField("data", bigquery.enums.SqlTypeNames.BYTES),
],
write_disposition = "WRITE_APPEND"
)
Examples of methods I have tried are:
1. bigquery API
job = bigquery_client.load_table_from_dataframe(
df, table, job_config=job_config
)
job.result()
2. pandas_gbq
df.to_gbq(detination_table, project_id, if_exists='append')
However, both give similar errors.
1. error
pyarrow.lib.ArrowInvalid: Got bytestring of length 8 (expected 16)
2. error
pandas_gbq.gbq.InvalidSchema: Please verify that the structure and data types in the DataFrame match the schema of the destination table.
Is there any way to solve this ?
I want to input python bytestring as bigquery byte data.
Thank you
Solution 1:[1]
The problem isn't coming from the insertion of your zlib compressed data. The error occurs on the insertion of the value of your key id
which is dataframe value 1 to the NUMERIC
data type in BigQuery.
The easiest solution for this is to change the datatype of your schema in BigQuery from NUMERIC
to INTEGER
.
However, if you really need your schema to be in NUMERIC
datatype, you may convert the dataframe datatype of 1 on your python code using decimal
library as derived from this SO post before loading it to BigQuery.
You may refer to below sample code.
from google.cloud import bigquery
import pandas
import zlib
import decimal
# Construct a BigQuery client object.
client = bigquery.Client()
# Set table_id to the ID of the table to create.
table_id = "my-project.my-dataset.my-table"
string = 'abs'
df = pandas.DataFrame()
data = zlib.compress(bytearray(string, encoding='utf-8'), -1)
record = df.append({'id' : 1, 'data' : data}, ignore_index=True)
df_2 = pandas.DataFrame(record)
df_2['id'] = df_2['id'].astype(str).map(decimal.Decimal)
dataframe = pandas.DataFrame(
df_2,
# In the loaded table, the column order reflects the order of the
# columns in the DataFrame.
columns=[
"id",
"data",
],
)
job_config = bigquery.LoadJobConfig(
schema=[
bigquery.SchemaField("id", bigquery.enums.SqlTypeNames.NUMERIC),
bigquery.SchemaField("data", bigquery.enums.SqlTypeNames.BYTES),
],
write_disposition = "WRITE_APPEND"
)
job = client.load_table_from_dataframe(
dataframe, table_id, job_config=job_config
) # Make an API request.
job.result() # Wait for the job to complete.
table = client.get_table(table_id) # Make an API request.
print(
"Loaded {} rows and {} columns to {}".format(
table.num_rows, len(table.schema), table_id
)
)
OUTPUT:
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |