'Get Table_Id along with Rowcount for all Tables in a Project

There are >100 datasets in one of my project and I want to get the Table_id * No_of_rows of each table lying in these 50 datasets.

I can get the metadata of a Dataset using the following query.

SELECT
  Project_id,
  dataset_id,
  table_id,
  row_count
FROM
  Project.dataset.__TABLES__

I can do this for each and every dataset (But that would take 100 iterations). Is there a better way??



Solution 1:[1]

You cannot do this in a single query, but, I did this python script for you:

from google.cloud import bigquery
client = bigquery.Client()

datasets = list(client.list_datasets())
project = client.project

if datasets:
    totalTables=0
    print('Datasets in project {}:'.format(project))
    for dataset in datasets:  # API request(s)
        print('Dataset: {}'.format(dataset.dataset_id))

        query_job = client.query("SELECT Project_id, dataset_id, table_id, row_count FROM "+dataset.dataset_id+".__TABLES__")

        results = query_job.result()
        for row in results:
            print("\tTable: {} : {}".format(row.table_id, row.row_count))

            totalTables=totalTables+int(row.row_count)
    print("\n\nTotal number of rows: {}".format(totalTables))
else:
    print('{} project does not contain any datasets.'.format(project))

Just run it (in the Cloud Shell for example).

EDIT I created this Feature Request to have the __DATASETS__ option in BigQuery. It should be great to run your command in all your datasets just setting FROM __DATASETS__.__TABLES__ right?

Solution 2:[2]

I realize this question is old, but did find a way to get table_id and row_count from multiple datasets in a BigQuery project and wanted to share.

    DECLARE tables_to_check Array<Struct<dataset_id STRING, project_id String>>;
    DECLARE i INT64 DEFAULT 0;
    
    -- store project and dataset ids in a struct
    set tables_to_check = Array(
    SELECT distinct as Struct table_schema as dataset_id, table_catalog as project_id
    FROM region-us.INFORMATION_SCHEMA.TABLES);
    
    create temp table results(dataset_id String, table_id String, row_count INT64);
    
    LOOP set i = i+1;
    if i > ARRAY_LENGTH(tables_to_check) THEN LEAVE;
    END IF;
    EXECUTE IMMEDIATE '''
    insert results
            SELECT "''' || tables_to_check[ORDINAL(i)].dataset_id || '''" as dataset_id, 
            table_id,
            row_count as row_count 
            from `''' || tables_to_check[ORDINAL(i)].project_id || '''.''' || tables_to_check[ORDINAL(i)].dataset_id || '''.__TABLES__`
    '''
    ;
    END LOOP;
    
    -- choose how to view the results
    select * from results order by row_count desc;

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2