'bq load command to load parquet file from GCS to BigQuery with column name start with number
I am loading parquet file into BigQuery using bq load command, my parquet file contains column name start with number (e.g. 00_abc, 01_xyz). since BigQuery don't support column name start number I have created column in BigQuery such as _00_abc, _01_xyz. But I am unable to load the parquet file to BigQuery using bq load command.
Is there any way to specify bq load command that source column 00_abc (from parquet file) will load to target column _00_abc (in BigQuery).
Thanks in advance.
Regards, Gouranga Basak
Solution 1:[1]
It's general best practice to not start a Parquet column name with a number. You will experience compatibility issues with more than just bq load
. For example, many Parquet readers use the parquet-avro
library, and Avro's documentation says:
The name portion of a fullname, record field names, and enum symbols must:
- start with
[A-Za-z_]
- subsequently contain only
[A-Za-z0-9_]
The solution here is to rename the column in the Parquet file. Depending on how much control you have over the Parquet file's creation, you may need to write a Cloud Function to rename the columns (Pandas Dataframes won't complain about your column names).
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |