'Partitioning BigQuery Tables via API in python
I'm using Python to hit the BigQuery API. I've been successful at running queries and writing new tables, but would like to ensure those output tables are partitioned per https://cloud.google.com/bigquery/docs/creating-partitioned-tables
The output of the query would have the columns: event_date[string in the format "2017-11-12"], metric[integer]
Per the code below, I've been assigning the "partitioning_type" code to various objects, but it never returns an error.
( I guess it'd also be useful to know how to tell if my partitioning efforts are actually working (i.e. how to identify the _PARTITIONTIME pseudo column)).
dest_table_id = "BQresults"
query_job = client.run_async_query(str(uuid.uuid4()), query))
query_job.allow_large_results = True
dest_dataset = client.dataset(dest_dataset_id)
dest_table = dest_dataset.table(dest_table_id)
dest_table.partitioning_type ="DAY"
query_job.destination = dest_table
query_job.write_disposition = 'WRITE_TRUNCATE'
query_job.use_legacy_sql = False
query_job.begin()
query_job.result()
Solution 1:[1]
If you want to check if the table is partitioned on a time column or not, use get_table()
method and check the partitioning_type
property of the returned object.
You can check on a integer partitioning checking the range_partitioning property. You can also get the job object using get_job()
with the job id and check if the time_partitioning
was set in the configuration.
I don't think that query job you're running results in the partitioned table, since the time_partitioning
should be set in the job configuration, and it seems like the client doesn't do this. If it is true, you can create partitioned table first and use existing table as a destination.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | dmigo |