'bq load command to load parquet file from GCS to BigQuery with column name start with number

I am loading parquet file into BigQuery using bq load command, my parquet file contains column name start with number (e.g. 00_abc, 01_xyz). since BigQuery don't support column name start number I have created column in BigQuery such as _00_abc, _01_xyz. But I am unable to load the parquet file to BigQuery using bq load command.

Is there any way to specify bq load command that source column 00_abc (from parquet file) will load to target column _00_abc (in BigQuery).

Thanks in advance.

Regards, Gouranga Basak



Solution 1:[1]

It's general best practice to not start a Parquet column name with a number. You will experience compatibility issues with more than just bq load. For example, many Parquet readers use the parquet-avro library, and Avro's documentation says:

The name portion of a fullname, record field names, and enum symbols must:

  • start with [A-Za-z_]
  • subsequently contain only [A-Za-z0-9_]

The solution here is to rename the column in the Parquet file. Depending on how much control you have over the Parquet file's creation, you may need to write a Cloud Function to rename the columns (Pandas Dataframes won't complain about your column names).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1