'Make existing bigquery table clustered
I have a quite huge existing partitioned table in bigquery. I want to make the table clustered, at least for the new partition.
From the documentation: https://cloud.google.com/bigquery/docs/creating-clustered-tables, it is said that we are able to Creating a clustered table when you load data
and I have tried to load a new partition using clustering fields: job_config.clustering_fields = ["event_type"]
.
The load finished successfully, however it seems that the new partition is not clustered (I am not really sure how to check whether it is clustered or not, but when I query to that particular partition it would always scan all rows).
Is there a good way to make clustering field for an existing partitioned table?
Any comment, suggestion, or answer is well appreciated.
Thanks a lot, Yosua
Solution 1:[1]
This answer is no longer valid / correct
https://cloud.google.com/bigquery/docs/creating-clustered-tables#modifying-cluster-spec
You can only specify clustering columns when a table is created
So, obviously you cannot expect existing non-clustered table and especially just new partitions to become clustered
The "workaround" is to create new table to be properly partitioned / clustered and load data into it from Google Cloud Storage (GCS). You can export data from original table into GCS first for this so whole process will be free of charge
Solution 2:[2]
BigQuery supports changing an existing non-clustered table to a clustered table and vice versa. You can also update the set of clustered columns of a clustered table.
You can change the clustering specification in the following ways:
Call the tables.update
or tables.patch
API method.
Call the bq command-line tool's bq update
command with the --clustering_fields
flag.
Reference
https://cloud.google.com/bigquery/docs/creating-clustered-tables#modifying-cluster-spec
Solution 3:[3]
What I missed from the above answers was a real example, so here it goes:
bq update --clustering_fields=tool,qualifier,user_id my_dataset.my_table
Where tool, qualifier and user_id are the three columns I want the table to be clustered by (in that order) and the table is my_dataset.my_table.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | dogmatic69 |
Solution 2 | SANN3 |
Solution 3 | Leonardo Campos |